Quantcast

Which is the best (right) use of NGrams?

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Which is the best (right) use of NGrams?

AlexR
Hello,

I was reading this group posts and it seems to be two school of thoughts for ngram use

1. index with ngram enabled analyzer but search with analyzer without ngrams so that a complete search terms are matched against ngrams
2. index with ngrams and search with ngrams

My understanding is:
 
#1 will require very long ngrams, there will be very few (one?) term matches per document and the longer/rarer the ngram matched the better is the match . It is essentially generationg tonns of "synonyms" (ngrams) for your searched field and match your terms to them. One of the problem is that ngram length should essentially be longer that the longest word. That seems to be an issue - while handful of characters is often enough to identify the document (think auto-complete scenario) providing a longer than max ngram length search token will return no hits 

#2 will need short 3-5 character ngrams at most and  will match n-grammed search term against ngrammed field in the index. The more matches the better score. The precision is probably not as good as #1 so it would need to be combined with search on original field and maybe shingled field. But will potentially handle simple typos

I have two use cases (both to be used in auto-complete pick lists)

1. A long identifier (contract number) 10-30 character which needs to be searched on any part of it
2. Company name which need to be searched on individual words from start of the words (could use phrase prefix query or edgeNgram)

Could you please share your opinion about #1 and #2 (and any other techniques you used) and their applicability to my cases

Thank you,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

egaumer
The general approach is to index ngrams in a separate field and then craft a query that searches on both fields but boosts matches on the non ngram field. This way you match on partial words (ngrams) but favor matches on whole tokens. This is generally where DisMax is useful because the query plays an important role in fine tuning the relevance.

-Eric



On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
Hello,

I was reading this group posts and it seems to be two school of thoughts for ngram use

1. index with ngram enabled analyzer but search with analyzer without ngrams so that a complete search terms are matched against ngrams
2. index with ngrams and search with ngrams

My understanding is:
 
#1 will require very long ngrams, there will be very few (one?) term matches per document and the longer/rarer the ngram matched the better is the match . It is essentially generationg tonns of "synonyms" (ngrams) for your searched field and match your terms to them. One of the problem is that ngram length should essentially be longer that the longest word. That seems to be an issue - while handful of characters is often enough to identify the document (think auto-complete scenario) providing a longer than max ngram length search token will return no hits 

#2 will need short 3-5 character ngrams at most and  will match n-grammed search term against ngrammed field in the index. The more matches the better score. The precision is probably not as good as #1 so it would need to be combined with search on original field and maybe shingled field. But will potentially handle simple typos

I have two use cases (both to be used in auto-complete pick lists)

1. A long identifier (contract number) 10-30 character which needs to be searched on any part of it
2. Company name which need to be searched on individual words from start of the words (could use phrase prefix query or edgeNgram)

Could you please share your opinion about #1 and #2 (and any other techniques you used) and their applicability to my cases

Thank you,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR

Thank you Eric I understand that but you can use them in two ways as per my post.

On Feb 20, 2013 12:45 PM, "egaumer" <[hidden email]> wrote:
The general approach is to index ngrams in a separate field and then craft a query that searches on both fields but boosts matches on the non ngram field. This way you match on partial words (ngrams) but favor matches on whole tokens. This is generally where DisMax is useful because the query plays an important role in fine tuning the relevance.

-Eric



On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
Hello,

I was reading this group posts and it seems to be two school of thoughts for ngram use

1. index with ngram enabled analyzer but search with analyzer without ngrams so that a complete search terms are matched against ngrams
2. index with ngrams and search with ngrams

My understanding is:
 
#1 will require very long ngrams, there will be very few (one?) term matches per document and the longer/rarer the ngram matched the better is the match . It is essentially generationg tonns of "synonyms" (ngrams) for your searched field and match your terms to them. One of the problem is that ngram length should essentially be longer that the longest word. That seems to be an issue - while handful of characters is often enough to identify the document (think auto-complete scenario) providing a longer than max ngram length search token will return no hits 

#2 will need short 3-5 character ngrams at most and  will match n-grammed search term against ngrammed field in the index. The more matches the better score. The precision is probably not as good as #1 so it would need to be combined with search on original field and maybe shingled field. But will potentially handle simple typos

I have two use cases (both to be used in auto-complete pick lists)

1. A long identifier (contract number) 10-30 character which needs to be searched on any part of it
2. Company name which need to be searched on individual words from start of the words (could use phrase prefix query or edgeNgram)

Could you please share your opinion about #1 and #2 (and any other techniques you used) and their applicability to my cases

Thank you,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

Matt Weber-2
For autocomplete I typically use:

- whitespace tokenizer
- word delimiter token filter
- edge-ngram token filter

At query time, I do not perform the edge-ngrams. This approach will
work for your 2nd use-case, but your first use-case is kind of tricky.
 I would index that field twice, the first field would use:

- keyword tokenizer
- edge ngram

The 2nd field would use:

- keyword tokenizer
- reverse token filter
- edge ngram

Again, skip the edge-ngrams at query time.  This will allow prefix
matching and suffix matching on your contract number.  A contract
number of 12345, you will get as a suggestion for queries of 12 or
345.

Hope this helps.

Thanks,
Matt Weber


On Wed, Feb 20, 2013 at 10:50 AM, Alex Roytman <[hidden email]> wrote:

> Thank you Eric I understand that but you can use them in two ways as per my
> post.
>
> On Feb 20, 2013 12:45 PM, "egaumer" <[hidden email]> wrote:
>>
>> The general approach is to index ngrams in a separate field and then craft
>> a query that searches on both fields but boosts matches on the non ngram
>> field. This way you match on partial words (ngrams) but favor matches on
>> whole tokens. This is generally where DisMax is useful because the query
>> plays an important role in fine tuning the relevance.
>>
>> -Eric
>>
>>
>>
>> On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
>>>
>>> Hello,
>>>
>>> I was reading this group posts and it seems to be two school of thoughts
>>> for ngram use
>>>
>>> 1. index with ngram enabled analyzer but search with analyzer without
>>> ngrams so that a complete search terms are matched against ngrams
>>> 2. index with ngrams and search with ngrams
>>>
>>> My understanding is:
>>>
>>> #1 will require very long ngrams, there will be very few (one?) term
>>> matches per document and the longer/rarer the ngram matched the better is
>>> the match . It is essentially generationg tonns of "synonyms" (ngrams) for
>>> your searched field and match your terms to them. One of the problem is that
>>> ngram length should essentially be longer that the longest word. That seems
>>> to be an issue - while handful of characters is often enough to identify the
>>> document (think auto-complete scenario) providing a longer than max ngram
>>> length search token will return no hits
>>>
>>> #2 will need short 3-5 character ngrams at most and  will match n-grammed
>>> search term against ngrammed field in the index. The more matches the better
>>> score. The precision is probably not as good as #1 so it would need to be
>>> combined with search on original field and maybe shingled field. But will
>>> potentially handle simple typos
>>>
>>> I have two use cases (both to be used in auto-complete pick lists)
>>>
>>> 1. A long identifier (contract number) 10-30 character which needs to be
>>> searched on any part of it
>>> 2. Company name which need to be searched on individual words from start
>>> of the words (could use phrase prefix query or edgeNgram)
>>>
>>> Could you please share your opinion about #1 and #2 (and any other
>>> techniques you used) and their applicability to my cases
>>>
>>> Thank you,
>>> Alex
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [hidden email].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

egaumer
In reply to this post by AlexR
Doing ngram analysis on the query side will usually introduce a lot of noise (i.e., relevance is bad). 

The problem with auto-suggest is that it's hard to get relevance tuned just right because you're usually matching against very small text fragments. At the same time, relevance is really subjective making it hard to measure with any real accuracy. Doing ngram analysis on the query side exacerbates the problem in my experience. With that said, use cases differ as does the quality of the data driving the auto-suggest and that can cause your milage to vary. 

If I had doubts I'd just test both cases against my actual data and requirements. That'll provide a more definitive answer.

-Eric


On Wednesday, February 20, 2013 1:50:48 PM UTC-5, AlexR wrote:

Thank you Eric I understand that but you can use them in two ways as per my post.

On Feb 20, 2013 12:45 PM, "egaumer" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="-k_ZUp5Awh8J">ega...@...> wrote:
The general approach is to index ngrams in a separate field and then craft a query that searches on both fields but boosts matches on the non ngram field. This way you match on partial words (ngrams) but favor matches on whole tokens. This is generally where DisMax is useful because the query plays an important role in fine tuning the relevance.

-Eric



On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
Hello,

I was reading this group posts and it seems to be two school of thoughts for ngram use

1. index with ngram enabled analyzer but search with analyzer without ngrams so that a complete search terms are matched against ngrams
2. index with ngrams and search with ngrams

My understanding is:
 
#1 will require very long ngrams, there will be very few (one?) term matches per document and the longer/rarer the ngram matched the better is the match . It is essentially generationg tonns of "synonyms" (ngrams) for your searched field and match your terms to them. One of the problem is that ngram length should essentially be longer that the longest word. That seems to be an issue - while handful of characters is often enough to identify the document (think auto-complete scenario) providing a longer than max ngram length search token will return no hits 

#2 will need short 3-5 character ngrams at most and  will match n-grammed search term against ngrammed field in the index. The more matches the better score. The precision is probably not as good as #1 so it would need to be combined with search on original field and maybe shingled field. But will potentially handle simple typos

I have two use cases (both to be used in auto-complete pick lists)

1. A long identifier (contract number) 10-30 character which needs to be searched on any part of it
2. Company name which need to be searched on individual words from start of the words (could use phrase prefix query or edgeNgram)

Could you please share your opinion about #1 and #2 (and any other techniques you used) and their applicability to my cases

Thank you,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="-k_ZUp5Awh8J">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
In reply to this post by Matt Weber-2
Thanks Matt! 

That is what I was going to do before I found some older thread about using short ngrams at both indexing and searching. I was intrigued if it would work well. Ed's experience (the post below) is that not very well. I am going to try just to have some first hand experience but I am pretty sure it is the approach you outlined I will goo with.

One question I have is whether you index direct and reverse edge engrams into the same filed or two separate ones. Particularly as it relates to highlighting. Will highlighting work if I index both into the same field?

Thanks 

Alex

On Wednesday, February 20, 2013 2:17:52 PM UTC-5, Matt Weber wrote:
For autocomplete I typically use:

- whitespace tokenizer
- word delimiter token filter
- edge-ngram token filter

At query time, I do not perform the edge-ngrams. This approach will
work for your 2nd use-case, but your first use-case is kind of tricky.
 I would index that field twice, the first field would use:

- keyword tokenizer
- edge ngram

The 2nd field would use:

- keyword tokenizer
- reverse token filter
- edge ngram

Again, skip the edge-ngrams at query time.  This will allow prefix
matching and suffix matching on your contract number.  A contract
number of 12345, you will get as a suggestion for queries of 12 or
345.

Hope this helps.

Thanks,
Matt Weber


On Wed, Feb 20, 2013 at 10:50 AM, Alex Roytman <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="a8rGVOjdhjsJ">royt...@...> wrote:

> Thank you Eric I understand that but you can use them in two ways as per my
> post.
>
> On Feb 20, 2013 12:45 PM, "egaumer" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="a8rGVOjdhjsJ">ega...@...> wrote:
>>
>> The general approach is to index ngrams in a separate field and then craft
>> a query that searches on both fields but boosts matches on the non ngram
>> field. This way you match on partial words (ngrams) but favor matches on
>> whole tokens. This is generally where DisMax is useful because the query
>> plays an important role in fine tuning the relevance.
>>
>> -Eric
>>
>>
>>
>> On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
>>>
>>> Hello,
>>>
>>> I was reading this group posts and it seems to be two school of thoughts
>>> for ngram use
>>>
>>> 1. index with ngram enabled analyzer but search with analyzer without
>>> ngrams so that a complete search terms are matched against ngrams
>>> 2. index with ngrams and search with ngrams
>>>
>>> My understanding is:
>>>
>>> #1 will require very long ngrams, there will be very few (one?) term
>>> matches per document and the longer/rarer the ngram matched the better is
>>> the match . It is essentially generationg tonns of "synonyms" (ngrams) for
>>> your searched field and match your terms to them. One of the problem is that
>>> ngram length should essentially be longer that the longest word. That seems
>>> to be an issue - while handful of characters is often enough to identify the
>>> document (think auto-complete scenario) providing a longer than max ngram
>>> length search token will return no hits
>>>
>>> #2 will need short 3-5 character ngrams at most and  will match n-grammed
>>> search term against ngrammed field in the index. The more matches the better
>>> score. The precision is probably not as good as #1 so it would need to be
>>> combined with search on original field and maybe shingled field. But will
>>> potentially handle simple typos
>>>
>>> I have two use cases (both to be used in auto-complete pick lists)
>>>
>>> 1. A long identifier (contract number) 10-30 character which needs to be
>>> searched on any part of it
>>> 2. Company name which need to be searched on individual words from start
>>> of the words (could use phrase prefix query or edgeNgram)
>>>
>>> Could you please share your opinion about #1 and #2 (and any other
>>> techniques you used) and their applicability to my cases
>>>
>>> Thank you,
>>> Alex
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="a8rGVOjdhjsJ">elasticsearc...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="a8rGVOjdhjsJ">elasticsearc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
In reply to this post by egaumer
Thanks Ed. I suspected that much. But as you have suggested I will do a quick test. Maybe the nature of the data (beginning of the alphanum contract number is usually good deal less unique than the end) will make it work well 

On Wednesday, February 20, 2013 2:25:11 PM UTC-5, egaumer wrote:
Doing ngram analysis on the query side will usually introduce a lot of noise (i.e., relevance is bad). 

The problem with auto-suggest is that it's hard to get relevance tuned just right because you're usually matching against very small text fragments. At the same time, relevance is really subjective making it hard to measure with any real accuracy. Doing ngram analysis on the query side exacerbates the problem in my experience. With that said, use cases differ as does the quality of the data driving the auto-suggest and that can cause your milage to vary. 

If I had doubts I'd just test both cases against my actual data and requirements. That'll provide a more definitive answer.

-Eric


On Wednesday, February 20, 2013 1:50:48 PM UTC-5, AlexR wrote:

Thank you Eric I understand that but you can use them in two ways as per my post.

On Feb 20, 2013 12:45 PM, "egaumer" <[hidden email]> wrote:
The general approach is to index ngrams in a separate field and then craft a query that searches on both fields but boosts matches on the non ngram field. This way you match on partial words (ngrams) but favor matches on whole tokens. This is generally where DisMax is useful because the query plays an important role in fine tuning the relevance.

-Eric



On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
Hello,

I was reading this group posts and it seems to be two school of thoughts for ngram use

1. index with ngram enabled analyzer but search with analyzer without ngrams so that a complete search terms are matched against ngrams
2. index with ngrams and search with ngrams

My understanding is:
 
#1 will require very long ngrams, there will be very few (one?) term matches per document and the longer/rarer the ngram matched the better is the match . It is essentially generationg tonns of "synonyms" (ngrams) for your searched field and match your terms to them. One of the problem is that ngram length should essentially be longer that the longest word. That seems to be an issue - while handful of characters is often enough to identify the document (think auto-complete scenario) providing a longer than max ngram length search token will return no hits 

#2 will need short 3-5 character ngrams at most and  will match n-grammed search term against ngrammed field in the index. The more matches the better score. The precision is probably not as good as #1 so it would need to be combined with search on original field and maybe shingled field. But will potentially handle simple typos

I have two use cases (both to be used in auto-complete pick lists)

1. A long identifier (contract number) 10-30 character which needs to be searched on any part of it
2. Company name which need to be searched on individual words from start of the words (could use phrase prefix query or edgeNgram)

Could you please share your opinion about #1 and #2 (and any other techniques you used) and their applicability to my cases

Thank you,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

Matt Weber-2
It will need to be two fields, one normal, one reverse.  You are going
to need to experiment with highlighting...  I have a feeling that is
going to give you some mixed results.  BTW, the other poster is Eric,
not Ed.  :)

On Wed, Feb 20, 2013 at 2:56 PM, AlexR <[hidden email]> wrote:

> Thanks Ed. I suspected that much. But as you have suggested I will do a
> quick test. Maybe the nature of the data (beginning of the alphanum contract
> number is usually good deal less unique than the end) will make it work well
>
>
> On Wednesday, February 20, 2013 2:25:11 PM UTC-5, egaumer wrote:
>>
>> Doing ngram analysis on the query side will usually introduce a lot of
>> noise (i.e., relevance is bad).
>>
>> The problem with auto-suggest is that it's hard to get relevance tuned
>> just right because you're usually matching against very small text
>> fragments. At the same time, relevance is really subjective making it hard
>> to measure with any real accuracy. Doing ngram analysis on the query side
>> exacerbates the problem in my experience. With that said, use cases differ
>> as does the quality of the data driving the auto-suggest and that can cause
>> your milage to vary.
>>
>> If I had doubts I'd just test both cases against my actual data and
>> requirements. That'll provide a more definitive answer.
>>
>> -Eric
>>
>>
>> On Wednesday, February 20, 2013 1:50:48 PM UTC-5, AlexR wrote:
>>>
>>> Thank you Eric I understand that but you can use them in two ways as per
>>> my post.
>>>
>>> On Feb 20, 2013 12:45 PM, "egaumer" <[hidden email]> wrote:
>>>>
>>>> The general approach is to index ngrams in a separate field and then
>>>> craft a query that searches on both fields but boosts matches on the non
>>>> ngram field. This way you match on partial words (ngrams) but favor matches
>>>> on whole tokens. This is generally where DisMax is useful because the query
>>>> plays an important role in fine tuning the relevance.
>>>>
>>>> -Eric
>>>>
>>>>
>>>>
>>>> On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I was reading this group posts and it seems to be two school of
>>>>> thoughts for ngram use
>>>>>
>>>>> 1. index with ngram enabled analyzer but search with analyzer without
>>>>> ngrams so that a complete search terms are matched against ngrams
>>>>> 2. index with ngrams and search with ngrams
>>>>>
>>>>> My understanding is:
>>>>>
>>>>> #1 will require very long ngrams, there will be very few (one?) term
>>>>> matches per document and the longer/rarer the ngram matched the better is
>>>>> the match . It is essentially generationg tonns of "synonyms" (ngrams) for
>>>>> your searched field and match your terms to them. One of the problem is that
>>>>> ngram length should essentially be longer that the longest word. That seems
>>>>> to be an issue - while handful of characters is often enough to identify the
>>>>> document (think auto-complete scenario) providing a longer than max ngram
>>>>> length search token will return no hits
>>>>>
>>>>> #2 will need short 3-5 character ngrams at most and  will match
>>>>> n-grammed search term against ngrammed field in the index. The more matches
>>>>> the better score. The precision is probably not as good as #1 so it would
>>>>> need to be combined with search on original field and maybe shingled field.
>>>>> But will potentially handle simple typos
>>>>>
>>>>> I have two use cases (both to be used in auto-complete pick lists)
>>>>>
>>>>> 1. A long identifier (contract number) 10-30 character which needs to
>>>>> be searched on any part of it
>>>>> 2. Company name which need to be searched on individual words from
>>>>> start of the words (could use phrase prefix query or edgeNgram)
>>>>>
>>>>> Could you please share your opinion about #1 and #2 (and any other
>>>>> techniques you used) and their applicability to my cases
>>>>>
>>>>> Thank you,
>>>>> Alex
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [hidden email].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

Lukáš Vlček
Hi there,

Interesting, I was experimenting with very similar use case (search suggestions on [possibly] short list of one-to-few words codes) with highlighting. It seems to be working fine and I can share more details if you are interested (though I would like to check couple of details first to make sure it is not buggy). My only concern is that my approach would not scale well for large data (I am not using edgeNGrams but nGrams).

Regards,
Lukas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
In reply to this post by AlexR
Sorry Eric :-( I was talking to Ed at the time and gut names confused....


On Wed, Feb 20, 2013 at 5:56 PM, AlexR <[hidden email]> wrote:
Thanks Ed. I suspected that much. But as you have suggested I will do a quick test. Maybe the nature of the data (beginning of the alphanum contract number is usually good deal less unique than the end) will make it work well 


On Wednesday, February 20, 2013 2:25:11 PM UTC-5, egaumer wrote:
Doing ngram analysis on the query side will usually introduce a lot of noise (i.e., relevance is bad). 

The problem with auto-suggest is that it's hard to get relevance tuned just right because you're usually matching against very small text fragments. At the same time, relevance is really subjective making it hard to measure with any real accuracy. Doing ngram analysis on the query side exacerbates the problem in my experience. With that said, use cases differ as does the quality of the data driving the auto-suggest and that can cause your milage to vary. 

If I had doubts I'd just test both cases against my actual data and requirements. That'll provide a more definitive answer.

-Eric


On Wednesday, February 20, 2013 1:50:48 PM UTC-5, AlexR wrote:

Thank you Eric I understand that but you can use them in two ways as per my post.

On Feb 20, 2013 12:45 PM, "egaumer" <[hidden email]> wrote:
The general approach is to index ngrams in a separate field and then craft a query that searches on both fields but boosts matches on the non ngram field. This way you match on partial words (ngrams) but favor matches on whole tokens. This is generally where DisMax is useful because the query plays an important role in fine tuning the relevance.

-Eric



On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
Hello,

I was reading this group posts and it seems to be two school of thoughts for ngram use

1. index with ngram enabled analyzer but search with analyzer without ngrams so that a complete search terms are matched against ngrams
2. index with ngrams and search with ngrams

My understanding is:
 
#1 will require very long ngrams, there will be very few (one?) term matches per document and the longer/rarer the ngram matched the better is the match . It is essentially generationg tonns of "synonyms" (ngrams) for your searched field and match your terms to them. One of the problem is that ngram length should essentially be longer that the longest word. That seems to be an issue - while handful of characters is often enough to identify the document (think auto-complete scenario) providing a longer than max ngram length search token will return no hits 

#2 will need short 3-5 character ngrams at most and  will match n-grammed search term against ngrammed field in the index. The more matches the better score. The precision is probably not as good as #1 so it would need to be combined with search on original field and maybe shingled field. But will potentially handle simple typos

I have two use cases (both to be used in auto-complete pick lists)

1. A long identifier (contract number) 10-30 character which needs to be searched on any part of it
2. Company name which need to be searched on individual words from start of the words (could use phrase prefix query or edgeNgram)

Could you please share your opinion about #1 and #2 (and any other techniques you used) and their applicability to my cases

Thank you,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

egaumer
No worries, I've been called much worse ;-) 

-Eric


On Wednesday, February 20, 2013 6:36:58 PM UTC-5, AlexR wrote:
Sorry Eric :-( I was talking to Ed at the time and gut names confused....


On Wed, Feb 20, 2013 at 5:56 PM, AlexR <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="8pqcxlCIUTUJ">royt...@...> wrote:
Thanks Ed. I suspected that much. But as you have suggested I will do a quick test. Maybe the nature of the data (beginning of the alphanum contract number is usually good deal less unique than the end) will make it work well 


On Wednesday, February 20, 2013 2:25:11 PM UTC-5, egaumer wrote:
Doing ngram analysis on the query side will usually introduce a lot of noise (i.e., relevance is bad). 

The problem with auto-suggest is that it's hard to get relevance tuned just right because you're usually matching against very small text fragments. At the same time, relevance is really subjective making it hard to measure with any real accuracy. Doing ngram analysis on the query side exacerbates the problem in my experience. With that said, use cases differ as does the quality of the data driving the auto-suggest and that can cause your milage to vary. 

If I had doubts I'd just test both cases against my actual data and requirements. That'll provide a more definitive answer.

-Eric


On Wednesday, February 20, 2013 1:50:48 PM UTC-5, AlexR wrote:

Thank you Eric I understand that but you can use them in two ways as per my post.

On Feb 20, 2013 12:45 PM, "egaumer" <[hidden email]> wrote:
The general approach is to index ngrams in a separate field and then craft a query that searches on both fields but boosts matches on the non ngram field. This way you match on partial words (ngrams) but favor matches on whole tokens. This is generally where DisMax is useful because the query plays an important role in fine tuning the relevance.

-Eric



On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
Hello,

I was reading this group posts and it seems to be two school of thoughts for ngram use

1. index with ngram enabled analyzer but search with analyzer without ngrams so that a complete search terms are matched against ngrams
2. index with ngrams and search with ngrams

My understanding is:
 
#1 will require very long ngrams, there will be very few (one?) term matches per document and the longer/rarer the ngram matched the better is the match . It is essentially generationg tonns of "synonyms" (ngrams) for your searched field and match your terms to them. One of the problem is that ngram length should essentially be longer that the longest word. That seems to be an issue - while handful of characters is often enough to identify the document (think auto-complete scenario) providing a longer than max ngram length search token will return no hits 

#2 will need short 3-5 character ngrams at most and  will match n-grammed search term against ngrammed field in the index. The more matches the better score. The precision is probably not as good as #1 so it would need to be combined with search on original field and maybe shingled field. But will potentially handle simple typos

I have two use cases (both to be used in auto-complete pick lists)

1. A long identifier (contract number) 10-30 character which needs to be searched on any part of it
2. Company name which need to be searched on individual words from start of the words (could use phrase prefix query or edgeNgram)

Could you please share your opinion about #1 and #2 (and any other techniques you used) and their applicability to my cases

Thank you,
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="8pqcxlCIUTUJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
In reply to this post by Lukáš Vlček
Hi Lukas,

It will be very interesting to compare notes. I will be out of town for few days and may not be able to conclude my test so lets touch base  next week if it's ok with you

Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
In reply to this post by Matt Weber-2
Matt,

My understanding is that prefix and suffix edge ngrams will only deal with searching on prefix and suffix but not on any internal substring of my contract number. I think I have to go with short ngram at index and search time and use match query with "and" to ensure precise (almost) match

BTW with suffix ngram I do not think there is a need to reverse filters as edge engram can be applied from the end of the word. As I understand reversing (index and search) is needed to get the back aligned ngrams but since they are supported directly no need to use it


On Wednesday, February 20, 2013 2:17:52 PM UTC-5, Matt Weber wrote:
For autocomplete I typically use:

- whitespace tokenizer
- word delimiter token filter
- edge-ngram token filter

At query time, I do not perform the edge-ngrams. This approach will
work for your 2nd use-case, but your first use-case is kind of tricky.
 I would index that field twice, the first field would use:

- keyword tokenizer
- edge ngram

The 2nd field would use:

- keyword tokenizer
- reverse token filter
- edge ngram

Again, skip the edge-ngrams at query time.  This will allow prefix
matching and suffix matching on your contract number.  A contract
number of 12345, you will get as a suggestion for queries of 12 or
345.

Hope this helps.

Thanks,
Matt Weber


On Wed, Feb 20, 2013 at 10:50 AM, Alex Roytman <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="a8rGVOjdhjsJ">royt...@...> wrote:

> Thank you Eric I understand that but you can use them in two ways as per my
> post.
>
> On Feb 20, 2013 12:45 PM, "egaumer" <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="a8rGVOjdhjsJ">ega...@...> wrote:
>>
>> The general approach is to index ngrams in a separate field and then craft
>> a query that searches on both fields but boosts matches on the non ngram
>> field. This way you match on partial words (ngrams) but favor matches on
>> whole tokens. This is generally where DisMax is useful because the query
>> plays an important role in fine tuning the relevance.
>>
>> -Eric
>>
>>
>>
>> On Wednesday, February 20, 2013 12:04:36 PM UTC-5, AlexR wrote:
>>>
>>> Hello,
>>>
>>> I was reading this group posts and it seems to be two school of thoughts
>>> for ngram use
>>>
>>> 1. index with ngram enabled analyzer but search with analyzer without
>>> ngrams so that a complete search terms are matched against ngrams
>>> 2. index with ngrams and search with ngrams
>>>
>>> My understanding is:
>>>
>>> #1 will require very long ngrams, there will be very few (one?) term
>>> matches per document and the longer/rarer the ngram matched the better is
>>> the match . It is essentially generationg tonns of "synonyms" (ngrams) for
>>> your searched field and match your terms to them. One of the problem is that
>>> ngram length should essentially be longer that the longest word. That seems
>>> to be an issue - while handful of characters is often enough to identify the
>>> document (think auto-complete scenario) providing a longer than max ngram
>>> length search token will return no hits
>>>
>>> #2 will need short 3-5 character ngrams at most and  will match n-grammed
>>> search term against ngrammed field in the index. The more matches the better
>>> score. The precision is probably not as good as #1 so it would need to be
>>> combined with search on original field and maybe shingled field. But will
>>> potentially handle simple typos
>>>
>>> I have two use cases (both to be used in auto-complete pick lists)
>>>
>>> 1. A long identifier (contract number) 10-30 character which needs to be
>>> searched on any part of it
>>> 2. Company name which need to be searched on individual words from start
>>> of the words (could use phrase prefix query or edgeNgram)
>>>
>>> Could you please share your opinion about #1 and #2 (and any other
>>> techniques you used) and their applicability to my cases
>>>
>>> Thank you,
>>> Alex
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="a8rGVOjdhjsJ">elasticsearc...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="a8rGVOjdhjsJ">elasticsearc...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
In reply to this post by Lukáš Vlček
Hi Lukas,

I did a bit testing and I could see several approaches for autocomplete style search in ANY part of a long identifier string (i.e. contract number)

1. Indexing and searching with short ngrams and searching using match with and condition
2. Searching by prefix or suffix (not any part) - indexing twice with start and back edge ngram and searching using term on un-engramed criteria
3. Use long ngram (say from 3 to 40 characters in my case) longer than maximum length of indexed contract number and searching it with un-ngrammed criteria.

#1 works fine and highlights well. There may possibly be some cases false hits but it should be pretty accurate in my case of searching contract numbers

#2 works fine but it is limited to prefix/suffix searches

#3 searches fine but highlighting is very erratic - sometimes it highlights and sometimes it does not the hits. Looks like a bug to me unless I am missing something

Another option is to do back (reverse) edge engrams and do prefix search on the result. I have not tried it but it should probably work well not sure about highlighting though.

Would you share your findings?

Thank you,
Alex



On Wednesday, February 20, 2013 6:34:43 PM UTC-5, Lukáš Vlček wrote:
Hi there,

Interesting, I was experimenting with very similar use case (search suggestions on [possibly] short list of one-to-few words codes) with highlighting. It seems to be working fine and I can share more details if you are interested (though I would like to check couple of details first to make sure it is not buggy). My only concern is that my approach would not scale well for large data (I am not using edgeNGrams but nGrams).

Regards,
Lukas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
Tested searching using A) match/and with short (3 char) ngrams (index and search time) vs B) using reverse (back aligned) edge ngrams with prefix query.
My search is in strings (contract numbers) of about 20-30 characters and when my search string is short (4-6 chars)  A runs about 2-3 times faster than B. When I plug 15-20 characters as my search string A and B run at about the same speed

On Monday, February 25, 2013 2:28:15 PM UTC-5, AlexR wrote:
Hi Lukas,

I did a bit testing and I could see several approaches for autocomplete style search in ANY part of a long identifier string (i.e. contract number)

1. Indexing and searching with short ngrams and searching using match with and condition
2. Searching by prefix or suffix (not any part) - indexing twice with start and back edge ngram and searching using term on un-engramed criteria
3. Use long ngram (say from 3 to 40 characters in my case) longer than maximum length of indexed contract number and searching it with un-ngrammed criteria.

#1 works fine and highlights well. There may possibly be some cases false hits but it should be pretty accurate in my case of searching contract numbers

#2 works fine but it is limited to prefix/suffix searches

#3 searches fine but highlighting is very erratic - sometimes it highlights and sometimes it does not the hits. Looks like a bug to me unless I am missing something

Another option is to do back (reverse) edge engrams and do prefix search on the result. I have not tried it but it should probably work well not sure about highlighting though.

Would you share your findings?

Thank you,
Alex



On Wednesday, February 20, 2013 6:34:43 PM UTC-5, Lukáš Vlček wrote:
Hi there,

Interesting, I was experimenting with very similar use case (search suggestions on [possibly] short list of one-to-few words codes) with highlighting. It seems to be working fine and I can share more details if you are interested (though I would like to check couple of details first to make sure it is not buggy). My only concern is that my approach would not scale well for large data (I am not using edgeNGrams but nGrams).

Regards,
Lukas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

Mohammady Mahdy
Can you post a gist of a sample mapping/ sample query?

On Tuesday, February 26, 2013 3:14:17 AM UTC+4, AlexR wrote:
Tested searching using A) match/and with short (3 char) ngrams (index and search time) vs B) using reverse (back aligned) edge ngrams with prefix query.
My search is in strings (contract numbers) of about 20-30 characters and when my search string is short (4-6 chars)  A runs about 2-3 times faster than B. When I plug 15-20 characters as my search string A and B run at about the same speed

On Monday, February 25, 2013 2:28:15 PM UTC-5, AlexR wrote:
Hi Lukas,

I did a bit testing and I could see several approaches for autocomplete style search in ANY part of a long identifier string (i.e. contract number)

1. Indexing and searching with short ngrams and searching using match with and condition
2. Searching by prefix or suffix (not any part) - indexing twice with start and back edge ngram and searching using term on un-engramed criteria
3. Use long ngram (say from 3 to 40 characters in my case) longer than maximum length of indexed contract number and searching it with un-ngrammed criteria.

#1 works fine and highlights well. There may possibly be some cases false hits but it should be pretty accurate in my case of searching contract numbers

#2 works fine but it is limited to prefix/suffix searches

#3 searches fine but highlighting is very erratic - sometimes it highlights and sometimes it does not the hits. Looks like a bug to me unless I am missing something

Another option is to do back (reverse) edge engrams and do prefix search on the result. I have not tried it but it should probably work well not sure about highlighting though.

Would you share your findings?

Thank you,
Alex



On Wednesday, February 20, 2013 6:34:43 PM UTC-5, Lukáš Vlček wrote:
Hi there,

Interesting, I was experimenting with very similar use case (search suggestions on [possibly] short list of one-to-few words codes) with highlighting. It seems to be working fine and I can share more details if you are interested (though I would like to check couple of details first to make sure it is not buggy). My only concern is that my approach would not scale well for large data (I am not using edgeNGrams but nGrams).

Regards,
Lukas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
Here it is 

  {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "analysis": {
        "filter": {
          "doc_key_edge_ngram_back": {
            "type": "edgeNGram",
            "min_gram": 4,
            "max_gram": 40,
            "side": "back"
          },
          "doc_key_ngram_short": {
            "min_gram": 4,
            "max_gram": 4,
            "type": "nGram"
          }
        },
        "analyzer": {
          "doc_key_partial": {
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "doc_key_edge_ngram_back"
            ],
            "type": "custom"
          },
          "doc_key_partial_short_ngram": {
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "doc_key_ngram_short"
            ],
            "type": "custom"
          },
          "doc_key": {
            "tokenizer": "keyword",
            "filter": [
              "lowercase"
            ],
            "type": "custom"
          }
        }
      }
    },
    "mappings": {
    
    ...
    
      "piid": {
        "type": "multi_field",
        "fields": {
          "piid": {
            "type": "string",
            "analyzer": "doc_key"
          },
          "partial": {
            "type": "string",
            "search_analyzer": "doc_key",
            "index_analyzer": "doc_key_partial",
            "include_in_all": false
          },
          "partial_sng": {
            "type": "string",
            "analyzer": "doc_key_partial_short_ngram",
            "include_in_all": false
          }
      }
    ...
    }


/* Here is how data looks like:

HHSN272NCU31551
HHSN263200000052112B
*/

//Query using short ngrams (not 100% precise - it can get false positives on overlapping grams...)

{
  "match": {
    "piid.partial_sng": {
    "query": query,
    "operator": "and"
  }
}

//Query using prefix on reverse edge ngrams

{
  "prefix": {
    "award.piid.partial": query
  }
}




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

Mohammady Mahdy
Interesting. will play around with it and will post you back if I can find a way (fast) to get rid of the false positives! Thanks for sharing.

Which one did you end up using? or are you still in research phase?

On Wednesday, February 27, 2013 4:38:47 AM UTC+4, AlexR wrote:
Here it is 

  {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "analysis": {
        "filter": {
          "doc_key_edge_ngram_back": {
            "type": "edgeNGram",
            "min_gram": 4,
            "max_gram": 40,
            "side": "back"
          },
          "doc_key_ngram_short": {
            "min_gram": 4,
            "max_gram": 4,
            "type": "nGram"
          }
        },
        "analyzer": {
          "doc_key_partial": {
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "doc_key_edge_ngram_back"
            ],
            "type": "custom"
          },
          "doc_key_partial_short_ngram": {
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "doc_key_ngram_short"
            ],
            "type": "custom"
          },
          "doc_key": {
            "tokenizer": "keyword",
            "filter": [
              "lowercase"
            ],
            "type": "custom"
          }
        }
      }
    },
    "mappings": {
    
    ...
    
      "piid": {
        "type": "multi_field",
        "fields": {
          "piid": {
            "type": "string",
            "analyzer": "doc_key"
          },
          "partial": {
            "type": "string",
            "search_analyzer": "doc_key",
            "index_analyzer": "doc_key_partial",
            "include_in_all": false
          },
          "partial_sng": {
            "type": "string",
            "analyzer": "doc_key_partial_short_ngram",
            "include_in_all": false
          }
      }
    ...
    }


/* Here is how data looks like:

HHSN272NCU31551
HHSN263200000052112B
*/

//Query using short ngrams (not 100% precise - it can get false positives on overlapping grams...)

{
  "match": {
    "piid.partial_sng": {
    "query": query,
    "operator": "and"
  }
}

//Query using prefix on reverse edge ngrams

{
  "prefix": {
    "award.piid.partial": query
  }
}




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Which is the best (right) use of NGrams?

AlexR
Still prototyping. For now I use prefix query on back aligned edge ngrams. I will experiment some more. Wonder how match/phrase works against short ngrams vs match/and
Please share your findings

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Loading...