Advice on my approach to this search problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Advice on my approach to this search problem

Nick Hoffman
Hey guys. I've been using ES for 1-2 weeks now, and love it. Being very new to it, though, I've been piecing together bits of knowledge as I go along. I have a semi-working solution to a problem, but I'm sure that it's nowhere close to an ideal solution. How would you approach this problem?

First, the data: The app has catalogs, products, and items, all of which are related. When a user performs a search, though, only products need to be found. For example, if there's a catalog named "Transformers" and the user searches for "Transformers", all products in that catalog should be returned. To accomplish this, when indexing products, I'm nesting the related catalog and item data inside the product data. Eg:
{ name: "Optimus Prime", number: "TFG1S1-1",
  catalog: { name: "Series 1", number: "TFG1S1" },
  items: [ { name: "Optimus Prime" }, { name: "Instructions" } ]
}

When searching, some users will misspell a word (Eg: "Trasformers", missing the "n"), and some will provide a partial word (Eg: "Trans"). Despite this, users expect to receive search results. Eg: Products with a field that contains "Transformer" or "retransmit" could match.

My solution right now is this:
https://gist.github.com/0cbe6892b4bc720bda92

Do you have any suggestions for improvements? Thanks for your help!
Nick
Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Nick Hoffman
Bump!
Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Karussell
spell check can be done via a dictionary

https://github.com/elasticsearch/elasticsearch/issues/646

,lucene 4

https://github.com/elasticsearch/elasticsearch/issues/911

or with a phonetic analyzer

http://stackoverflow.com/questions/6936256/elastic-search-implement-did-you-mean


is that what you were after?

On 17 Okt., 16:34, Nick Hoffman <[hidden email]> wrote:
> Bump!
Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Clinton Gormley-2
In reply to this post by Nick Hoffman

On Sat, 2011-10-15 at 08:20 -0700, Nick Hoffman wrote:
> Hey guys. I've been using ES for 1-2 weeks now, and love it. Being
> very new to it, though, I've been piecing together bits of knowledge
> as I go along. I have a semi-working solution to a problem, but I'm
> sure that it's nowhere close to an ideal solution. How would you
> approach this problem?

Hi Nick

The reason your misspellings work is because you are using ngrams for
both your search and index analyzers.

This may, however, give your users weird results, eg the user searches
for "slave" and gets a result for "lavatory" instead.

I would consider making a few changes:
1) use edge ngrams rather than ngrams ie s,sl,sla,slav,slave
2) use the edge ngram analyzers only as your search_analyzer
3) for your misspellings, if you get no results, then retry
   the query using some fuzziness:  
  http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html

clint


Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Nick Hoffman
In reply to this post by Karussell
On Monday, 17 October 2011 10:56:54 UTC-4, Karussell wrote:
spell check can be done via a dictionary

https://github.com/elasticsearch/elasticsearch/issues/646

,lucene 4

https://github.com/elasticsearch/elasticsearch/issues/911

or with a phonetic analyzer

http://stackoverflow.com/questions/6936256/elastic-search-implement-did-you-mean


is that what you were after?


Thanks for the suggestions, mate. Unfortunately, I can't do spell checking with a dictionary because many of the words are unique names. Eg:
Optimus Prime
Megatron
BE@RBRICK
etc

I was going to try a phonetic analyzer, but a lot of the names in my data are pronounced strangely, and thus wouldn't match. 
Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Nick Hoffman
In reply to this post by Clinton Gormley-2
On Monday, 17 October 2011 11:07:33 UTC-4, Clinton Gormley wrote:

The reason your misspellings work is because you are using ngrams for
both your search and index analyzers.

Yeah, I figured as much.
 

This may, however, give your users weird results, eg the user searches
for "slave" and gets a result for "lavatory" instead.

I would consider making a few changes:
1) use edge ngrams rather than ngrams ie s,sl,sla,slav,slave

Interesting. Why do you recommend that? I understand that it prevents the slave/lavatory example, which is great. However, it prevents mid-word matches. But then again, maybe that's a good thing...
 

2) use the edge ngram analyzers only as your search_analyzer

So don't use them in any of the index analyzers?
 

3) for your misspellings, if you get no results, then retry
   the query using some fuzziness:  
  http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html

A text query with the "fuzziness" option, or a fuzzy query[1]?

Thanks for your advice, Clint, and also for that example nGram gist. Very helpful!

[1] http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query.html
Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Clinton Gormley-2
Hiya

>         This may, however, give your users weird results, eg the user
>         searches
>         for "slave" and gets a result for "lavatory" instead.
>        
>         I would consider making a few changes:
>         1) use edge ngrams rather than ngrams ie s,sl,sla,slav,slave
>        
>        
> Interesting. Why do you recommend that? I understand that it prevents
> the slave/lavatory example, which is great. However, it prevents
> mid-word matches. But then again, maybe that's a good thing...

Yes exactly.  Full ngrams are useful for some purposes, eg matching
words in a URL, but in general, people start typing at one end of a word
and expect the search results to reflect that.

>  
>         2) use the edge ngram analyzers only as your search_analyzer

> So don't use them in any of the index analyzers?

Apologies - I meant the other way around. Use them in your index
analyzers, but use your ascii_std analyzer for search analyzers.

>  
>         3) for your misspellings, if you get no results, then retry
>            the query using some fuzziness:  
>        
>         http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html
>        
> A text query with the "fuzziness" option, or a fuzzy query[1]?

text with fuzziness.  A fuzzy query is actually a term query - the
search terms are not analyzed. However, a text query with fuzziness
gives you the analysis plus the fuzzy behaviour.


> Thanks for your advice, Clint, and also for that example nGram gist.
> Very helpful!

glad to hear it :)

clint




Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Nick Hoffman

Apologies - I meant the other way around. Use them in your index
analyzers, but use your ascii_std analyzer for search analyzers.

Thanks, Clint. That's working a lot better. However, I've noticed that I can't combine certain fields in the same text query. It seems to be related to nested fields. Any idea why that might be?

For example, ES accepts this:

curl -X GET -s "http://localhost:9200/development_products/product/_search?pretty=true" -d '{ query: { text: { "items.name": "optimus", "catalog.name": "optimus" } } }' 

But adding the "name" field to the beginning or end:

curl -X GET -s "http://localhost:9200/development_products/product/_search?pretty=true" -d '{ query: { text: { "items.name": "optimus", "catalog.name": "optimus", "name": "optimus" } } }'

generates an error:

{
  "error" : "SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][0]: SearchParseException[[development_products][0]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to parse source [{ query: { text: { \"items.name\": \"optimus\", \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested: SearchParseException[[development_products][0]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser for element [name]]]; }{[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][2]: SearchParseException[[development_products][2]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to parse source [{ query: { text: { \"items.name\": \"optimus\", \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested: SearchParseException[[development_products][2]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser for element [name]]]; }{[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][1]: SearchParseException[[development_products][1]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to parse source [{ query: { text: { \"items.name\": \"optimus\", \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested: SearchParseException[[development_products][1]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser for element [name]]]; }{[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][3]: SearchParseException[[development_products][3]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to parse source [{ query: { text: { \"items.name\": \"optimus\", \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested: SearchParseException[[development_products][3]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser for element [name]]]; }{[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][4]: SearchParseException[[development_products][4]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to parse source [{ query: { text: { \"items.name\": \"optimus\", \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested: SearchParseException[[development_products][4]: query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser for element [name]]]; }]",
  "status" : 500
}
Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Clinton Gormley-2

> Thanks, Clint. That's working a lot better. However, I've noticed that
> I can't combine certain fields in the same text query. It seems to be
> related to nested fields. Any idea why that might be?

You can't pass multiple field/search_text pairs to a single query.  ES
needs to know how to combine your various queries, so you need to have
each as a separate 'text' query and combine them using either bool or
dismax

clint

>
>
> For example, ES accepts this:
>
>
> curl -X GET -s
> "http://localhost:9200/development_products/product/_search?pretty=true" -d '{ query: { text: { "items.name": "optimus", "catalog.name": "optimus" } } }'
>
>
> But adding the "name" field to the beginning or end:
>
>
> curl -X GET -s
> "http://localhost:9200/development_products/product/_search?pretty=true" -d '{ query: { text: { "items.name": "optimus", "catalog.name": "optimus", "name": "optimus" } } }'
>
>
>
> generates an error:
>
>
> {
>   "error" : "SearchPhaseExecutionException[Failed to execute phase
> [query], total failure; shardFailures
> {[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][0]:
> SearchParseException[[development_products][0]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to
> parse source [{ query: { text: { \"items.name\": \"optimus\",
> \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested:
> SearchParseException[[development_products][0]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser
> for element
> [name]]]; }{[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][2]:
> SearchParseException[[development_products][2]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to
> parse source [{ query: { text: { \"items.name\": \"optimus\",
> \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested:
> SearchParseException[[development_products][2]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser
> for element
> [name]]]; }{[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][1]:
> SearchParseException[[development_products][1]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to
> parse source [{ query: { text: { \"items.name\": \"optimus\",
> \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested:
> SearchParseException[[development_products][1]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser
> for element
> [name]]]; }{[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][3]:
> SearchParseException[[development_products][3]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to
> parse source [{ query: { text: { \"items.name\": \"optimus\",
> \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested:
> SearchParseException[[development_products][3]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser
> for element
> [name]]]; }{[V6gkYzvcSg6Gx-Ad9-hbOg][development_products][4]:
> SearchParseException[[development_products][4]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [Failed to
> parse source [{ query: { text: { \"items.name\": \"optimus\",
> \"catalog.name\": \"optimus\", \"name\": \"optimus\" } } }]]]; nested:
> SearchParseException[[development_products][4]:
> query[items.name:optimus],from[-1],size[-1]: Parse Failure [No parser
> for element [name]]]; }]",
>   "status" : 500
> }


Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Nick Hoffman
That makes sense. Thanks. With dis_max, though, it looks like the edge-ngrams aren't being used.

For example, there're 3 documents whose "name" field is "Optimus Primal", and 35 whose "name" field is "Optimus Prime". I figured that a dis_max-text query for "primal" would match "Optimus Primal" and "Optimus Prime" docs. Unfortunately, only the 3 "Optimus Primal" docs matched. Why might that be?

curl -X GET -s "http://localhost:9200/development_products/product/_search?pretty=true" -d '
{
  fields: [ "name" ],
  query: {
    dis_max: {
      queries: [
        { text: { "name" : "primal" } }
      ]
    }
  }
}
'
Reply | Threaded
Open this post in threaded view
|

Re: Advice on my approach to this search problem

Clinton Gormley-2

>
> For example, there're 3 documents whose "name" field is "Optimus
> Primal", and 35 whose "name" field is "Optimus Prime". I figured that
> a dis_max-text query for "primal" would match "Optimus Primal" and
> "Optimus Prime" docs. Unfortunately, only the 3 "Optimus Primal" docs
> matched. Why might that be?

Because you are no longer using ngrams on your search analyzer, so we're
essentially doing a search for "primal*"

Try the same thing but search for "prim" instead

clint

>
>
> curl -X GET -s
> "http://localhost:9200/development_products/product/_search?pretty=true" -d '
>
> {
>   fields: [ "name" ],
>   query: {
>     dis_max: {
>       queries: [
>         { text: { "name" : "primal" } }
>       ]
>     }
>   }
> }
> '