Map, analyze and search phone number

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Map, analyze and search phone number

Sindre Sorhus
What is the best way to map, analyze and search a field with a phone number?

I have phone numbers in various formats.
+47 23546798
+47 23 54 67 98
+47 235 46 798
+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Otis Gospodnetic
Hi,

This is essentially a search for an arbitrary substring.  You can use
n-grams for that.

Otis
--
Sematext is hiring Search Engineers -- http://sematext.com/about/jobs.html


On Aug 8, 4:13 pm, Sindre Sorhus <[hidden email]> wrote:

> What is the best way to map, analyze and search a field with a phone number?
>
> I have phone numbers in various formats.
> +47 23546798
> +47 23 54 67 98
> +47 235 46 798
> +4723546798
> 23546798
>
> I need to be able to search for "23546798" or "2354" and find all the phone
> numbers above. What kind of analyzers should I use?
>
> Any thought's about ES having a built-in phone number field?
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

kimchy
Administrator
In reply to this post by Sindre Sorhus
There was another thread where we talked a bit about having a specific phone number field, that would know about different phone number formats and index it in a form that would help make it more searchable (possibly as numeric field..., really depends on the type of search needed to be executed). It gets complicated when it comes to internalization and the like, and might not fit all cases, but we can try and start with something and see how it goes...

On Mon, Aug 8, 2011 at 11:13 PM, Sindre Sorhus <[hidden email]> wrote:
What is the best way to map, analyze and search a field with a phone number?

I have phone numbers in various formats.
<a href="tel:%2B47%2023546798" value="+4723546798" target="_blank">+47 23546798
<a href="tel:%2B47%2023%2054%2067%2098" value="+4723546798" target="_blank">+47 23 54 67 98
<a href="tel:%2B47%C2%A0235%2046%20798" value="+4723546798" target="_blank">+47 235 46 798
<a href="tel:%2B4723546798" value="+4723546798" target="_blank">+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?

Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

yark
Is just a new analyzer solve the problem?

We need analyzer, which able to ignore all not number chars and treat rest of chars as a separated chars, isn’t it?
So than query of different parts of number as a "phrase" will match.

9 серп. 2011, в 12:08, Shay Banon написал(а):

There was another thread where we talked a bit about having a specific phone number field, that would know about different phone number formats and index it in a form that would help make it more searchable (possibly as numeric field..., really depends on the type of search needed to be executed). It gets complicated when it comes to internalization and the like, and might not fit all cases, but we can try and start with something and see how it goes...

On Mon, Aug 8, 2011 at 11:13 PM, Sindre Sorhus <[hidden email]> wrote:
What is the best way to map, analyze and search a field with a phone number?

I have phone numbers in various formats.
<a href="tel:%2B47%2023546798" value="+4723546798" target="_blank">+47 23546798
<a href="tel:%2B47%2023%2054%2067%2098" value="+4723546798" target="_blank">+47 23 54 67 98
<a href="tel:%2B47%C2%A0235%2046%20798" value="+4723546798" target="_blank">+47 235 46 798
<a href="tel:%2B4723546798" value="+4723546798" target="_blank">+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?


Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

kimchy
Administrator
It depends since format of phone number can very. Also, it would be nice to have an option to "extract" phone numbers from text.

2011/8/9 Yaroslav <[hidden email]>
Is just a new analyzer solve the problem?

We need analyzer, which able to ignore all not number chars and treat rest of chars as a separated chars, isn’t it?
So than query of different parts of number as a "phrase" will match.

9 серп. 2011, в 12:08, Shay Banon написал(а):

There was another thread where we talked a bit about having a specific phone number field, that would know about different phone number formats and index it in a form that would help make it more searchable (possibly as numeric field..., really depends on the type of search needed to be executed). It gets complicated when it comes to internalization and the like, and might not fit all cases, but we can try and start with something and see how it goes...

On Mon, Aug 8, 2011 at 11:13 PM, Sindre Sorhus <[hidden email]> wrote:
What is the best way to map, analyze and search a field with a phone number?

I have phone numbers in various formats.
<a href="tel:%2B47%2023546798" value="+4723546798" target="_blank">+47 23546798
<a href="tel:%2B47%2023%2054%2067%2098" value="+4723546798" target="_blank">+47 23 54 67 98
<a href="tel:%2B47%C2%A0235%2046%20798" value="+4723546798" target="_blank">+47 235 46 798
<a href="tel:%2B4723546798" value="+4723546798" target="_blank">+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?



Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Karel Minařík
In reply to this post by Otis Gospodnetic
Couldn't regex based analyzer be used for extracting phone numbers? I
guess it would be a bit more lightweight then n-grams?

Karel

On Aug 9, 9:03 am, Otis Gospodnetic <[hidden email]>
wrote:

> Hi,
>
> This is essentially a search for an arbitrary substring.  You can use
> n-grams for that.
>
> Otis
> --
> Sematext is hiring Search Engineers --http://sematext.com/about/jobs.html
>
> On Aug 8, 4:13 pm, Sindre Sorhus <[hidden email]> wrote:
>
>
>
> > What is the best way to map, analyze and search a field with a phone number?
>
> > I have phone numbers in various formats.
> >+47 23546798
> >+47 23 54 67 98
> >+47 235 46 798
> >+4723546798
> > 23546798
>
> > I need to be able to search for "23546798" or "2354" and find all the phone
> > numbers above. What kind of analyzers should I use?
>
> > Any thought's about ES having a built-in phone number field?
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Sindre Sorhus
In reply to this post by kimchy
That would be really useful. Not having to parse the phone numbers myself, and having them easily searchable.
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Sindre Sorhus
In reply to this post by kimchy
Something like the iOS data detectors. This could work on more than phone number, dates, locations, ... but it depends on how big you want to make it.
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Sindre Sorhus
In reply to this post by Otis Gospodnetic
How would that work with phone numbers with whitespace? What should I set on the min-max grams?
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Otis Gospodnetic
I haven't followed this thread closely, but it looks like there are 2
separate things here:

1. phone number detection/extraction
2. phone number tokenization and search

Can't 1. be handled with something like GATE or any other NER tool?
After 1. is done, isn't searching arbitrary phone number substrings
just a matter of n-gramming?

Otis
--
Sematext in hiring Search Engineers -- http://sematext.com/about/jobs.html



On Aug 10, 3:37 am, Sindre Sorhus <[hidden email]> wrote:
> How would that work with phone numbers with whitespace? What should I set on
> the min-max grams?
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Sindre Sorhus
In reply to this post by kimchy
Should I file a bug on github for it?
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

David Sachs
Perhaps you could use/integrate http://code.google.com/p/libphonenumber/, Google's library for phone number parsing in a variety of locales/formats.  It's Apache-licensed as well.

David
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Sindre Sorhus
Yes, I know, that's what I ended up using. But still, it would be very convenient with a "phonenumber" type in ES.
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Sindre Sorhus
In reply to this post by kimchy
Old discussion here: http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/459046042558bfeb
Reply | Threaded
Open this post in threaded view
|

Re: Map, analyze and search phone number

Sindre Sorhus
In reply to this post by Otis Gospodnetic
I tried using nGrams, but I can't get it to work.

I have a number saved in ES as a string like this "48121245", but when I search for "481 21 245", it doesn't find anything.

What am I doing wrong?