Problem configuring PatternReplaceFilter

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem configuring PatternReplaceFilter

Alexander Reelsen
Hi there,

I am having trouble configuring the pattern replace filter

My configuration looks like this:

index:
  analysis:
    analyzer:
      default:
        type: ae_analyzer

      ae_analyzer:
        type: custom
        tokenizer: standard
        filter: [umlaut_replace]

    filter:
      umlaut_replace:
        type : pattern_replace
        pattern: "ä"
        replacement: "a"


The exception I get on startup is:

INFO: An exception was caught and reported. Message:
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it

Taking a look at the analysis module, there is a line referencing
org.elasticsearch
type = tokenFilterSettings.getAsClass("type", null,
"org.elasticsearch.index.analysis.", "TokenFilterFactory");

However the PatternStringFilter is at some org.apache package...

Might this be the cause or am I simply misconfiguring something badly?


Regards, Alexander
Reply | Threaded
Open this post in threaded view
|

Re: Problem configuring PatternReplaceFilter

kimchy
Administrator
Can you do a get settings to see if the type is really there for the filter (note, settings get munged into key value pairs)? Also, for this usecase, though I would love to help fixing it, you might want to consider using the asciifolding filter? (http://www.elasticsearch.org/guide/reference/index-modules/analysis/asciifolding-tokenfilter.html).

On Mon, Aug 1, 2011 at 7:08 PM, Alexander Reelsen <[hidden email]> wrote:
Hi there,

I am having trouble configuring the pattern replace filter

My configuration looks like this:

index:
 analysis:
   analyzer:
     default:
       type: ae_analyzer

     ae_analyzer:
       type: custom
       tokenizer: standard
       filter: [umlaut_replace]

   filter:
     umlaut_replace:
       type : pattern_replace
       pattern: "ä"
       replacement: "a"


The exception I get on startup is:

INFO: An exception was caught and reported. Message:
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it
org.elasticsearch.ElasticSearchIllegalArgumentException: Token Filter
[umlaut_replace] must have a type associated with it

Taking a look at the analysis module, there is a line referencing
org.elasticsearch
type = tokenFilterSettings.getAsClass("type", null,
"org.elasticsearch.index.analysis.", "TokenFilterFactory");

However the PatternStringFilter is at some org.apache package...

Might this be the cause or am I simply misconfiguring something badly?


Regards, Alexander

Reply | Threaded
Open this post in threaded view
|

Re: Problem configuring PatternReplaceFilter

Alexander Reelsen
Hi,

Completely my fault. I tested against a 0.16 version of elasticsearch,
where the filter was not included yet. Works smoothly with 0.17. Sorry
for that.

I did not upgrade to 0.17, because the installation of plugins on the
filesystem did not work like in 0.16. I tracked it down because of not
using the complete file:/// URL, which is needed now in 0.17 instead
of only providing a directory as in 0.16. This resulted in some
zipfileexception (which is in fact a file not found error). Now our
river implementation also works with 0.17 and we upgraded.

Thanks for helping, going to hide ashamed behind a rock now :-)


--Alexander
Reply | Threaded
Open this post in threaded view
|

Re: Problem configuring PatternReplaceFilter

Ivan Brusic
Aha!  That explains the situation I was experiencing the other day after upgrading.  I assumed it was due to the zip file being wrongly named.

-- 
Ivan

On Tue, Aug 2, 2011 at 7:28 AM, Alexander Reelsen <[hidden email]> wrote:

I did not upgrade to 0.17, because the installation of plugins on the
filesystem did not work like in 0.16. I tracked it down because of not
using the complete file:/// URL, which is needed now in 0.17 instead
of only providing a directory as in 0.16. This resulted in some
zipfileexception (which is in fact a file not found error). Now our
river implementation also works with 0.17 and we upgraded.

Reply | Threaded
Open this post in threaded view
|

Re: Problem configuring PatternReplaceFilter

Jan Fiedler
In reply to this post by Alexander Reelsen
Maybe off topic but maybe helpful anyway: Instead of using the PatternReplaceFilter you may want to look at the ASCIIFoldingFilter that automatically converts lots of non ASCII characters (such as German umlauts) into their ASCII equivalents (http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/ASCIIFoldingFilter.html). This way you would not have to define explicit mappings for every character and would automatically cover other common cases such as accented chars (like in Créme).
Reply | Threaded
Open this post in threaded view
|

Re: Problem configuring PatternReplaceFilter

Alexander Reelsen
Hi Jan,

On 3 Aug., 09:05, Jan Fiedler <[hidden email]> wrote:
> Maybe off topic but maybe helpful anyway: Instead of using the
> PatternReplaceFilter you may want to look at the ASCIIFoldingFilter that
> automatically converts lots of non ASCII characters (such as German umlauts)
> into their ASCII equivalents (http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analys...).
Right. As far as I know this works only, if you want to create a from
ä... in some special cases you might want to create "ae" instead


--Alexander
Reply | Threaded
Open this post in threaded view
|

Re: Problem configuring PatternReplaceFilter

Jan Fiedler
Yeah, if you are looking for the 'ä' -> 'ae' you may find the following thread helpful (http://elasticsearch-users.115913.n3.nabble.com/Folding-German-characters-like-umlauts-td2176078.html). I have not tried the German2 stemmer myself. Based on pure Lucene (2.x back then) I relied on the synonym approach described in the thread.