Re: Preventing phrase search from matching across sentence boundaries.
I can't think of an analysis component that does this out-of-box but it is a requirement that comes up often. If you're comfortable creating a TokenFilter yourself then you could write one that inflates the position increment at whatever characters are of interest. Alternatively you could break up your data before indexing it into ElasticSearch, so each sentence or part of a sentence was a new value. Multiple values for a field are indexed with large position increments in between them.
On Tuesday, November 13, 2012 4:13:28 AM UTC+13, Robin Hughes wrote:
I'd like to know if there is a way to configure analysis so periods and commas result in a position increment. The purpose of this is to that phrase queries will not match across sentence boundaries.
i.e. a span_term query with terms "one" and "two" with a slop of zero would match a document containing "one two" but not one containing "one. two"