Index-per-user required for common terms query and cutoff_frequency?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Index-per-user required for common terms query and cutoff_frequency?

Loren
The docs mention that "One of the benefits of cutoff_frequency is that you get domain-specific stopwords for free." 

It seems like the index-per-user approach is required here in order to make the term frequencies accurate. If you used a shared index or even faked an index per user, your TF counts for some field would reflect the index as a whole (aggregated across the counts for each shard in that index), not just for that user. If you tended to just query the documents for one user at a time using some filter field, the common terms query would probably not return the results you are expecting.

Am I understanding this correctly?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/398cfc81-ba3e-458c-840f-aee5e94902c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Index-per-user required for common terms query and cutoff_frequency?

Nikolas Everett


On Wed, Apr 29, 2015 at 2:53 PM, Loren <[hidden email]> wrote:
The docs mention that "One of the benefits of cutoff_frequency is that you get domain-specific stopwords for free." 

It seems like the index-per-user approach is required here in order to make the term frequencies accurate. If you used a shared index or even faked an index per user, your TF counts for some field would reflect the index as a whole (aggregated across the counts for each shard in that index), not just for that user. If you tended to just query the documents for one user at a time using some filter field, the common terms query would probably not return the results you are expecting.

Am I understanding this correctly?



I think you understand the issue perfectly, yes. cutoff_frequency is per shard so each shard would need to contain only a single domain for the stopwords to really work.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1m7xk_Hq36i%2BA7aRFsdinaAX1dJ%3DUa%2BL9qkB%3DjKwLDjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.