More Like This (mlt) vs query_string or multi_match

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

More Like This (mlt) vs query_string or multi_match

Mike
What is the difference between mlt and query_string/multi_match queries?

Besides the fact that query_string goes through the query parsing process, I don't see how More Like This differs from these 2 query types.  All 3 of them allow me to specify multiple fields to search against, and boost the fields differently.  Am I missing something that mlt does differently for scoring under the covers which query_string and multi_match do differently?

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: More Like This (mlt) vs query_string or multi_match

Anil Rhemtulla
Still learning about all this myself, but I do know the MLT is quite different as it looks at terms in the source document instead of a a query_string which would only search on a few words of the document (assuming you have a non-trivial document).

This might help:
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/




On Friday, December 7, 2012 2:24:54 PM UTC-8, Mike wrote:
What is the difference between mlt and query_string/multi_match queries?

Besides the fact that query_string goes through the query parsing process, I don't see how More Like This differs from these 2 query types.  All 3 of them allow me to specify multiple fields to search against, and boost the fields differently.  Am I missing something that mlt does differently for scoring under the covers which query_string and multi_match do differently?

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: More Like This (mlt) vs query_string or multi_match

Mike
So are you saying that MLT used tf*idf for relevance scoring, while query_string & multi_match just check for the existance of the terms like in a boolean query for scoring?

I found this in the group search though, where Shay says that MLT is essentially just boolean query with all the terms in the should clause.
https://groups.google.com/d/msg/elasticsearch/QVTWA48wxcc/6sOS6huIip8J

Isn't that essentially what the other 2 queries do when you have the default operator set to OR?  If MLT is the only query that uses tf*idf, then what does query_string/multi_match do for scoring that is different?




On Saturday, December 8, 2012 12:49:26 AM UTC-5, Anil Rhemtulla wrote:
Still learning about all this myself, but I do know the MLT is quite different as it looks at terms in the source document instead of a a query_string which would only search on a few words of the document (assuming you have a non-trivial document).

This might help:




On Friday, December 7, 2012 2:24:54 PM UTC-8, Mike wrote:
What is the difference between mlt and query_string/multi_match queries?

Besides the fact that query_string goes through the query parsing process, I don't see how More Like This differs from these 2 query types.  All 3 of them allow me to specify multiple fields to search against, and boost the fields differently.  Am I missing something that mlt does differently for scoring under the covers which query_string and multi_match do differently?

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: More Like This (mlt) vs query_string or multi_match

Igor Motov-3
MLT, query_string and multi_match are all using tf*idf. And as Shay said you can think of MLT as a big boolean query. MLT just has a lot of "knobs" you can tweak that affect the resulted boolean query. You can remove very frequent and rare terms from your query, set boost for terms occurring in your query multiple times, set how many terms should match in terms of percentage of the query text and so on. 

On Monday, December 10, 2012 11:58:07 AM UTC-5, Mike wrote:
So are you saying that MLT used tf*idf for relevance scoring, while query_string & multi_match just check for the existance of the terms like in a boolean query for scoring?

I found this in the group search though, where Shay says that MLT is essentially just boolean query with all the terms in the should clause.
https://groups.google.com/d/msg/elasticsearch/QVTWA48wxcc/6sOS6huIip8J

Isn't that essentially what the other 2 queries do when you have the default operator set to OR?  If MLT is the only query that uses tf*idf, then what does query_string/multi_match do for scoring that is different?




On Saturday, December 8, 2012 12:49:26 AM UTC-5, Anil Rhemtulla wrote:
Still learning about all this myself, but I do know the MLT is quite different as it looks at terms in the source document instead of a a query_string which would only search on a few words of the document (assuming you have a non-trivial document).

This might help:




On Friday, December 7, 2012 2:24:54 PM UTC-8, Mike wrote:
What is the difference between mlt and query_string/multi_match queries?

Besides the fact that query_string goes through the query parsing process, I don't see how More Like This differs from these 2 query types.  All 3 of them allow me to specify multiple fields to search against, and boost the fields differently.  Am I missing something that mlt does differently for scoring under the covers which query_string and multi_match do differently?

--