How to set the anlyzer?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

How to set the anlyzer?

John Chang
My lucene app (which I am converting to ElasticSearch) uses org.apache.lucene.analysis.snowball.SnowballAnalyzer as its analyzer.  I like it for the stemming abilities.  How can I get support for this (or another analyzer) in ElasticSearch?  Thanks.

I am currently on ElasticSearch 0.7.1.
Reply | Threaded
Open this post in threaded view
|

Re: How to set the anlyzer?

kimchy
Administrator
http://www.elasticsearch.com/docs/elasticsearch/index_modules/analysis/

On Tue, May 25, 2010 at 10:24 PM, John Chang <[hidden email]> wrote:

My lucene app (which I am converting to ElasticSearch) uses
org.apache.lucene.analysis.snowball.SnowballAnalyzer as its analyzer.  I
like it for the stemming abilities.  How can I get support for this (or
another analyzer) in ElasticSearch?  Thanks.

I am currently on ElasticSearch 0.7.1.
--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-anlyzer-tp842952p842952.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: How to set the anlyzer?

John Chang
I looked in the org.elasticsearch.index.analysis directory and could not find any analyzer privider for the SnowballAnalyzer.  So, using the BrazilianAnalyzerProvider as an example, I tried to create my own, but I keep getting this
error upon startup:
"1) Could not find a suitable constructor in com.kiha.server.services.index.KihaSnowballAnalyzer. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private."

However, my class does have one constructor with an @Inject annotation.  This is what I did:


I tried adding this to my elasticsearch.json:
index:
  analysis:
    analyzer:
      standard:
        type: com.mycompany.server.services.index.MyCompanySnowballAnalyzerProvider

I put com.mycompany.server.services.index.MyCompanySnowballAnalyzerProvider into a jar in the ./lib directory.  The class has this constructor:

    @Inject public MyCompanySnowballAnalyzerProvider(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {
        super(index, indexSettings, name);
        analyzer = new SnowballAnalyzer(Version.LUCENE_CURRENT, "English");
    }


The class extends AbstractAnalyzerProvider:
public class MyCompanySnowballAnalyzerProvider extends AbstractAnalyzerProvider<SnowballAnalyzer>
Reply | Threaded
Open this post in threaded view
|

Re: How to set the anlyzer?

John Chang
Typo: in the above message, I meant to replace "kiha" with "mycompany" throughout, but missed some places.  So "Could not find a suitable constructor in com.kiha.server.services.index.KihaSnowballAnalyzer." should be read as "Could not find a suitable constructor in com.mycompany.server.services.index.MyCompanySnowballAnalyzerProvider." in order to correspond to the rest of my post.  Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: How to set the anlyzer?

kimchy
Administrator
This should work, strange. You get an exception for com.kiha.server.services.index.KihaSnowballAnalyzer, while I would expect you would get the exception for com.kiha.server.services.index.KihaSnowballAnalyzerProvider (note the Provider at the end). Which one are you getting again?

On Thu, May 27, 2010 at 12:23 AM, John Chang <[hidden email]> wrote:

Typo: in the above message, I meant to replace "kiha" with "mycompany"
throughout, but missed some places.  So "Could not find a suitable
constructor in com.kiha.server.services.index.KihaSnowballAnalyzer." should
be read as "Could not find a suitable constructor in
com.mycompany.server.services.index.MyCompanySnowballAnalyzerProvider." in
order to correspond to the rest of my post.  Thanks.
--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-anlyzer-tp842952p846196.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: How to set the anlyzer?

John Chang
I am getting an exception from com.kiha.server.services.index.KihaSnowballAnalyzer.  Although perhaps oddly named, this class IS an extension of AbstractAnalyzerProvider.  It is declared as:

   public class KihaSnowballAnalyzer extends AbstractAnalyzerProvider<SnowballAnalyzer> 

I do not have a class named KihaSnowballAnalyzerProvider.
Reply | Threaded
Open this post in threaded view
|

Re: How to set the anlyzer?

kimchy
Administrator
Can you write a quick test case and I will check why this happens?

On Wed, Jun 2, 2010 at 1:29 AM, John Chang <[hidden email]> wrote:

I am getting an exception from
com.kiha.server.services.index.KihaSnowballAnalyzer.  Although perhaps oddly
named, this class IS an extension of AbstractAnalyzerProvider.  It is
declared as:

  public class KihaSnowballAnalyzer extends
AbstractAnalyzerProvider<SnowballAnalyzer>

I do not have a class named KihaSnowballAnalyzerProvider.
--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-anlyzer-tp842952p863060.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: How to set the anlyzer?

John Chang
I can't write a test case in terms of a JUnit to repro the issue, as I can't get the server to start up without error.  The test is simply to start up the server and look for errors.  If you wanted to repro it, all I can think of is to try to run my class and config.  To repro:

1) Use 0.7.1 (no particular reason I'm not on latest; just that it changes often, I can try with the latest if you advise)
2) Add this to your elasticsearch.yml:  
index:
  analysis:
    analyzer:
      standard:
        type: com.kiha.server.services.index.KihaSnowballAnalyzer

3) Compile this class and put it in your classpath.  Note that it is modeled on the BrazilianAnalyzer I found.  Note also that to get it to compile, you need lucene-snowball-3.0.0-sources.jar.

Thanks for your time.  


package com.kiha.server.services.index;


import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Iterators;
import com.google.inject.Inject;
import com.google.inject.assistedinject.Assisted;
//import org.apache.lucene.analysis.br.BrazilianAnalyzer;
import org.apache.lucene.analysis.snowball.SnowballAnalyzer;
import org.apache.lucene.util.Version;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.analysis.AbstractAnalyzerProvider;
import org.elasticsearch.index.settings.IndexSettings;
import org.elasticsearch.util.settings.Settings;

import java.util.Set;


public class KihaSnowballAnalyzer extends AbstractAnalyzerProvider<SnowballAnalyzer> {

    private final Set<?> stopWords;

    private final Set<?> stemExclusion;

    private final SnowballAnalyzer analyzer;

    @Inject public KihaSnowballAnalyzer(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {
        super(index, indexSettings, name);
        String[] stopWords = settings.getAsArray("stopwords");
        if (stopWords.length > 0) {
            this.stopWords = ImmutableSet.copyOf(Iterators.forArray(stopWords));
        } else {
            this.stopWords = ImmutableSet.copyOf(Iterators.forArray());
            //this.stopWords = BrazilianAnalyzer.getDefaultStopSet();
        }

        String[] stemExclusion = settings.getAsArray("stem_exclusion");
        if (stemExclusion.length > 0) {
            this.stemExclusion = ImmutableSet.copyOf(Iterators.forArray(stemExclusion));
        } else {
            this.stemExclusion = ImmutableSet.of();
        }
        analyzer = new SnowballAnalyzer(Version.LUCENE_CURRENT, "English");
    }

    @Override public SnowballAnalyzer get() {
        return this.analyzer;
    }
}