Preventing phrase search from matching across sentence boundaries.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Preventing phrase search from matching across sentence boundaries.

Robin Hughes
Hi

I'd like to know if there is a way to configure analysis so periods and commas result in a position increment. The purpose of this is to that phrase queries will not match across sentence boundaries.

i.e. a span_term query with terms "one" and "two" with a slop of zero would match a document containing "one two" but not one containing "one. two"

Regards

Robin

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Preventing phrase search from matching across sentence boundaries.

Chris Male
Hi Robin,

I can't think of an analysis component that does this out-of-box but it is a requirement that comes up often.  If you're comfortable creating a TokenFilter yourself then you could write one that inflates the position increment at whatever characters are of interest.  Alternatively you could break up your data before indexing it into ElasticSearch, so each sentence or part of a sentence was a new value.  Multiple values for a field are indexed with large position increments in between them.

On Tuesday, November 13, 2012 4:13:28 AM UTC+13, Robin Hughes wrote:
Hi

I'd like to know if there is a way to configure analysis so periods and commas result in a position increment. The purpose of this is to that phrase queries will not match across sentence boundaries.

i.e. a span_term query with terms "one" and "two" with a slop of zero would match a document containing "one two" but not one containing "one. two"

Regards

Robin

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Preventing phrase search from matching across sentence boundaries.

Reza Sadoddin
This post has NOT been accepted by the mailing list yet.
In reply to this post by Robin Hughes
This requirement came up in my work, and I found out it has been already raised in this group. I am using Wikipedia index through a river plugin and need to limit the scope of "near by" queries.


Just wondering if this can be done easily in newer versions of ES for non-expert users?

If not, I appreciate sharing your experience on developing a custom analyzer for this purpose.
Thanks,