How can I make this search requirement work?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

How can I make this search requirement work?

mooky
I have a bit of an odd requirement in so far as analyzer is concerned. Wondering if anyone has any tips/suggestions. 
I have an item I am indexing (grade) that has a property (name) whose value can be "0# (99.995%)". 
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
I also want the user to be able to copy-paste "0# (99.995%)" and it should work.

I am currently using the whitespace analyzer - which works for many of my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How can I make this search requirement work?

vineeth mohan-2
Hello Mooky , 

You can apply multiple analyzers to a field -https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
          Vineeth


On Tue, Jul 15, 2014 at 8:10 PM, mooky <[hidden email]> wrote:
I have a bit of an odd requirement in so far as analyzer is concerned. Wondering if anyone has any tips/suggestions. 
I have an item I am indexing (grade) that has a property (name) whose value can be "0# (99.995%)". 
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
I also want the user to be able to copy-paste "0# (99.995%)" and it should work.

I am currently using the whitespace analyzer - which works for many of my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mdRgWFJ8Q3Nwr%2BWh6SLFGtzcCWJg1VVV%2BSbOEhw5ktzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How can I make this search requirement work?

Glen Smith
In reply to this post by mooky
I would start by suggesting that you create an indexing/querying analyzer specifically for the field you know has this format.

Otherwise, I think your likeliest path to success, I think, is somewhere in the character filters domain.
Character filters are applied to the string before the tokenizer:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html

One possibility here is a pattern replace char filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html

If you can write a matching pattern for all of the allowed values of this field, and replace them with just the number,
apply that pattern to your indexing and searching, then you are only dealing with searching for the numbers.

You may need a different character filter for the search analyzer, though, since you are allowing for more formats than
are found in the source document field.



On Tuesday, July 15, 2014 10:40:30 AM UTC-4, mooky wrote:
I have a bit of an odd requirement in so far as analyzer is concerned. Wondering if anyone has any tips/suggestions. 
I have an item I am indexing (grade) that has a property (name) whose value can be "0# (99.995%)". 
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
I also want the user to be able to copy-paste "0# (99.995%)" and it should work.

I am currently using the whitespace analyzer - which works for many of my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/805c3115-be4f-4ea5-a0d0-0153f9216043%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How can I make this search requirement work?

mooky
In reply to this post by vineeth mohan-2
Thanks. That looks interesting!


On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
Hello Mooky , 

You can apply multiple analyzers to a field -<a href="https://github.com/yakaz/elasticsearch-analysis-combo/" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fyakaz%2Felasticsearch-analysis-combo%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHiI3A6QTS8eIuQPQI1yAXc3Qgqkw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fyakaz%2Felasticsearch-analysis-combo%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHiI3A6QTS8eIuQPQI1yAXc3Qgqkw';return true;">https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
          Vineeth


On Tue, Jul 15, 2014 at 8:10 PM, mooky <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="J9zvogXuAFMJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">nick.mi...@...> wrote:
I have a bit of an odd requirement in so far as analyzer is concerned. Wondering if anyone has any tips/suggestions. 
I have an item I am indexing (grade) that has a property (name) whose value can be "0# (99.995%)". 
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
I also want the user to be able to copy-paste "0# (99.995%)" and it should work.

I am currently using the whitespace analyzer - which works for many of my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="J9zvogXuAFMJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e1a9a56-c504-4bc3-b59f-aed6e0226796%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How can I make this search requirement work?

mooky
In reply to this post by vineeth mohan-2
I think I can probably use a combo of the whitespace* and standard analyzers.

My current analyzer settings are :

{

   
"analysis": {
       
"analyzer": {
           
"default_index": {
               
"tokenizer": "whitespace",
               
"filter": ["lowercase"]
           
},
           
"default_search": {
               
"tokenizer": "whitespace",
               
"filter": ["lowercase"]
           
}
       
}
   
}
}


-M


On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
Hello Mooky , 

You can apply multiple analyzers to a field -<a href="https://github.com/yakaz/elasticsearch-analysis-combo/" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fyakaz%2Felasticsearch-analysis-combo%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHiI3A6QTS8eIuQPQI1yAXc3Qgqkw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fyakaz%2Felasticsearch-analysis-combo%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHiI3A6QTS8eIuQPQI1yAXc3Qgqkw';return true;">https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
          Vineeth


On Tue, Jul 15, 2014 at 8:10 PM, mooky <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="J9zvogXuAFMJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">nick.mi...@...> wrote:
I have a bit of an odd requirement in so far as analyzer is concerned. Wondering if anyone has any tips/suggestions. 
I have an item I am indexing (grade) that has a property (name) whose value can be "0# (99.995%)". 
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
I also want the user to be able to copy-paste "0# (99.995%)" and it should work.

I am currently using the whitespace analyzer - which works for many of my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="J9zvogXuAFMJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1f3177ef-020f-4263-bae4-ced1870567e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How can I make this search requirement work?

mooky
In reply to this post by mooky
And it works a treat. Thanks.

It leads me to think that it would be very useful to use with a series of specialist (special-case) analyzers in conjunction with the standard analyzer.

Back to my original example - "0# (99.995%)" - what I really want is something that will extract "99.995%".
The standard analyzer will extract "99.995" (and the rest of the text), the whitespace analyzer will extract "(99.995%)".

Does a financial/numeric/accounting analyzer already exist? ie Something that extracts "99.995%" or "$44.5665" or "-45bps" ?

-M






On Tuesday, 15 July 2014 18:58:46 UTC+1, mooky wrote:
Thanks. That looks interesting!


On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
Hello Mooky , 

You can apply multiple analyzers to a field -<a href="https://github.com/yakaz/elasticsearch-analysis-combo/" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fyakaz%2Felasticsearch-analysis-combo%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHiI3A6QTS8eIuQPQI1yAXc3Qgqkw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fyakaz%2Felasticsearch-analysis-combo%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNHiI3A6QTS8eIuQPQI1yAXc3Qgqkw';return true;">https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
          Vineeth


On Tue, Jul 15, 2014 at 8:10 PM, mooky <[hidden email]> wrote:
I have a bit of an odd requirement in so far as analyzer is concerned. Wondering if anyone has any tips/suggestions. 
I have an item I am indexing (grade) that has a property (name) whose value can be "0# (99.995%)". 
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
I also want the user to be able to copy-paste "0# (99.995%)" and it should work.

I am currently using the whitespace analyzer - which works for many of my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How can I make this search requirement work?

smonasco-2
A little late to the party but I would have used a custom index analyzer with lowercase, pattern, edgengram and a search analyzer of lowercase, pattern  (maybe you have to flip lowercase and pattern)

With the pattern tokenizer you can specify a regex.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/693ed0c3-2998-4da4-b30a-c7bf9f311770%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How can I make this search requirement work?

vineeth mohan-2
In reply to this post by mooky
Hello Mooky , 

Elasticsearch is not any domain specific and hence wont take out these financial terms.
You will need to write your own analyzer to facilitate this function.

Thanks
           Vineeth


On Wed, Jul 16, 2014 at 4:17 PM, mooky <[hidden email]> wrote:
And it works a treat. Thanks.

It leads me to think that it would be very useful to use with a series of specialist (special-case) analyzers in conjunction with the standard analyzer.

Back to my original example - "0# (99.995%)" - what I really want is something that will extract "99.995%".
The standard analyzer will extract "99.995" (and the rest of the text), the whitespace analyzer will extract "(99.995%)".

Does a financial/numeric/accounting analyzer already exist? ie Something that extracts "99.995%" or "$44.5665" or "-45bps" ?

-M






On Tuesday, 15 July 2014 18:58:46 UTC+1, mooky wrote:
Thanks. That looks interesting!


On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
Hello Mooky , 

You can apply multiple analyzers to a field -https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
          Vineeth


On Tue, Jul 15, 2014 at 8:10 PM, mooky <[hidden email]> wrote:
I have a bit of an odd requirement in so far as analyzer is concerned. Wondering if anyone has any tips/suggestions. 
I have an item I am indexing (grade) that has a property (name) whose value can be "0# (99.995%)". 
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
I also want the user to be able to copy-paste "0# (99.995%)" and it should work.

I am currently using the whitespace analyzer - which works for many of my cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont work.
Does anyone have some different suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kg7TRG%3DX_%2B7tAueFaZ8pUYXbHrJhFZMVQaYcQyTicenQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.