Using PatternTokenizer

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Using PatternTokenizer

ppearcy
Hello,
  Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul
Reply | Threaded
Open this post in threaded view
|

Re: Using PatternTokenizer

kimchy
Administrator
Yes, but it can be part of the built in analyzers in elasticsearch (I assume you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul <[hidden email]> wrote:
Hello,
 Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

Reply | Threaded
Open this post in threaded view
|

Re: Using PatternTokenizer

kimchy
Administrator
Add this: http://github.com/elasticsearch/elasticsearch/issues/issue/276.

On Sun, Jul 25, 2010 at 9:50 PM, Shay Banon <[hidden email]> wrote:
Yes, but it can be part of the built in analyzers in elasticsearch (I assume you refer to the one in Lucene).

-shay.banon


On Sun, Jul 25, 2010 at 12:28 PM, Paul <[hidden email]> wrote:
Hello,
 Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul


Reply | Threaded
Open this post in threaded view
|

Re: Using PatternTokenizer

ppearcy
In reply to this post by kimchy
Yeah, it probably makes sense to have it built in. I'd be happy to
create a fork and submit it. Would plan on exposing the pattern,
lowercase, and stopwords options that map directly to Lucene's
PatternAnalyzer inputs.

A separate pattern tokenizer would be nice to combine with other
options, but that doesn't appear to exist in Lucene (though Solr has a
more flexible version based on regex grouping that will probably be
available with the Lucene/Solr merge). Not that it would be hard to
write, just don't need it for my use case.

Thanks,
Paul

On Jul 25, 12:50 pm, Shay Banon <[hidden email]> wrote:

> Yes, but it can be part of the built in analyzers in elasticsearch (I assume
> you refer to the one in Lucene).
>
> -shay.banon
>
> On Sun, Jul 25, 2010 at 12:28 PM, Paul <[hidden email]> wrote:
> > Hello,
> >  Is it correct that in order to use the PatternTokenizer, one would
> > need to implement a plugin similar to icu?
>
> > Thanks,
> > Paul
Reply | Threaded
Open this post in threaded view
|

Re: Using PatternTokenizer

ppearcy
Huh, somehow the Nabble (which shows your response referencing
http://github.com/elasticsearch/elasticsearch/issues/issue/276) and
google groups which doesn't are out of sync.

Anyway, thanks a ton! Seems straight forward and I'll let you know if
there are any issues.

Best Regards,
Paul

On Jul 25, 5:16 pm, Paul <[hidden email]> wrote:

> Yeah, it probably makes sense to have it built in. I'd be happy to
> create a fork and submit it. Would plan on exposing the pattern,
> lowercase, and stopwords options that map directly to Lucene's
> PatternAnalyzer inputs.
>
> A separate pattern tokenizer would be nice to combine with other
> options, but that doesn't appear to exist in Lucene (though Solr has a
> more flexible version based on regex grouping that will probably be
> available with the Lucene/Solr merge). Not that it would be hard to
> write, just don't need it for my use case.
>
> Thanks,
> Paul
>
> On Jul 25, 12:50 pm, Shay Banon <[hidden email]> wrote:
>
> > Yes, but it can be part of the built in analyzers in elasticsearch (I assume
> > you refer to the one in Lucene).
>
> > -shay.banon
>
> > On Sun, Jul 25, 2010 at 12:28 PM, Paul <[hidden email]> wrote:
> > > Hello,
> > >  Is it correct that in order to use the PatternTokenizer, one would
> > > need to implement a plugin similar to icu?
>
> > > Thanks,
> > > Paul