Support for Anchoring in Elasticsearch Regex

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Support for Anchoring in Elasticsearch Regex

vaidik
Hi Folks,

I see that Elasticsearch supports Regex. But that is limited to Lucene's Regex Engine which does not support anchoring i.e. the entire string will always be anchored. This works as long as you have fixed regular expressions to run, but in cases where the regex query is taken from the user, this becomes very limiting.

Is there an alternative regex engine for Elasticsearch that at least supports $ and ^ for anchoring? Quick Google and Github search did not get me anything. If not, then is anybody doing something similar or have a work around? One possible solution that I can think of is converting user's entered regex to Lucene compatible regex. But that gets really complex to do correctly with all the grouping and alternation in regex.

I don't want the entire Perl regex kind of support. Just the anchoring bit is important. Has anybody tried to solve this problem before?

Thanks,
Vaidik Kapoor
vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACWtv5%3DvoJ1B7K9CN3M0O9hvLTyV0cJVq3qiy%2BJiy0crTfPRjg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Support for Anchoring in Elasticsearch Regex

Lee Gee
Lucene and Elastic Search both anchor regexp by default.

"Lucene’s patterns are always anchored. The pattern provided must match the entire string. "

— http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax


On Wednesday, December 18, 2013 7:19:48 AM UTC, Vaidik Kapoor wrote:
Hi Folks,

I see that Elasticsearch supports Regex. But that is limited to Lucene's Regex Engine which does not support anchoring i.e. the entire string will always be anchored. This works as long as you have fixed regular expressions to run, but in cases where the regex query is taken from the user, this becomes very limiting.

Is there an alternative regex engine for Elasticsearch that at least supports $ and ^ for anchoring? Quick Google and Github search did not get me anything. If not, then is anybody doing something similar or have a work around? One possible solution that I can think of is converting user's entered regex to Lucene compatible regex. But that gets really complex to do correctly with all the grouping and alternation in regex.

I don't want the entire Perl regex kind of support. Just the anchoring bit is important. Has anybody tried to solve this problem before?

Thanks,
Vaidik Kapoor
<a href="http://vaidikkapoor.info" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fvaidikkapoor.info\46sa\75D\46sntz\0751\46usg\75AFQjCNH5zKfq-xDwNXP5NRZ6kj1jIUZ_Iw';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fvaidikkapoor.info\46sa\75D\46sntz\0751\46usg\75AFQjCNH5zKfq-xDwNXP5NRZ6kj1jIUZ_Iw';return true;">vaidikkapoor.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27a0c79c-94bc-4878-b355-dd4895bc4135%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.