I have got a little Problem with my synonym filter ....

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

I have got a little Problem with my synonym filter ....

Ste Phan
... I build a little sample of what I do.

My Test Synonyms file is (test.syn placed into my /etc/elasticsearch folder):

aaa,bbb,ccc,ddd
www,xxx,yyy,zzz
eee,fff,ggg,hhh => 111
sss,ttt,uuu,vvv => 222
rrr => 333,444,555

I created an index like so:

PUT /testindex?pretty
{
    "settings": {
        "analysis": {
            "analyzer": {
                "myIndexAnalyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "mySynonymsFilter"
                    ]
                },
                "mySearchAnalyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase"
                    ]
                }
            },
            "filter": {
                "mySynonymsFilter": {
                    "type": "synonym",
                    "ignore_case": true,
                    "synonyms_path": "test.syn"
                }
            }
        }
        },
    "mappings": {
        "testitem": {
            "properties": {
                "title": {
                    "type": "string",
                    "index_analyzer": "myIndexAnalyzer",
                    "search_analyzer": "mySearchAnalyzer"
                }
            }
        }
    }
}

and added some data:

POST /_bulk
{ "index": { "_index": "testindex", "_type": "testitem", "_id": "1" }}
{ "title":    "aaa test daten eintrag." }
{ "index": { "_index": "testindex", "_type": "testitem", "_id": "2" }}
{ "title":    "bbb test daten eintrag." }
{ "index": { "_index": "testindex", "_type": "testitem", "_id": "3" }}
{ "title":    "eee test daten eintrag." }

Testing the myIndexAnalyzer using

POST /testindex/_analyze?analyzer=myIndexAnalyzer&pretty
{aaa test daten eintrag}

Results to:

{
   "tokens": [
      {
         "token": "aaa",
         "start_offset": 1,
         "end_offset": 4,
         "type": "SYNONYM",
         "position": 1
      },
      {
         "token": "bbb",
         "start_offset": 1,
         "end_offset": 4,
         "type": "SYNONYM",
         "position": 1
      },
      {
         "token": "ccc",
         "start_offset": 1,
         "end_offset": 4,
         "type": "SYNONYM",
         "position": 1
      },
      {
         "token": "ddd",
         "start_offset": 1,
         "end_offset": 4,
         "type": "SYNONYM",
         "position": 1
      },
      {
         "token": "test",
         "start_offset": 5,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

Which to me seems to be fine.

Searching this index, i expected to find Record Ids 1 and 2 if I am searching for "aaa", "bbb", "ccc", "ddd".

Which is my fault??

TIA
Ste Phan


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/64e90076-f905-4490-bfe8-3b1607e5e98a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: I have got a little Problem with my synonym filter ....

Ste Phan
 
I forgot to figure out that if search for "aaa" I receive Record _id = 1,
 
searching vor "bbb" I receive Record _id = 2 ... nothing else. 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a74f413-bd31-4c9a-a3a0-95084cc2fc0d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: I have got a little Problem with my synonym filter ....

Ivan Brusic
In reply to this post by Ste Phan

What kind of query are you executing? Are you query against a specific field? A match query against the title field should work.

When using the analyze API, explicit state the field and not the analyzer for more accurate behavior of what really goes on.

Cheers,

Ivan

On Apr 21, 2015 11:40 AM, "Ste Phan" <[hidden email]> wrote:
... I build a little sample of what I do.

My Test Synonyms file is (test.syn placed into my /etc/elasticsearch folder):

aaa,bbb,ccc,ddd
www,xxx,yyy,zzz
eee,fff,ggg,hhh => 111
sss,ttt,uuu,vvv => 222
rrr => 333,444,555

I created an index like so:

PUT /testindex?pretty
{
    "settings": {
        "analysis": {
            "analyzer": {
                "myIndexAnalyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "mySynonymsFilter"
                    ]
                },
                "mySearchAnalyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase"
                    ]
                }
            },
            "filter": {
                "mySynonymsFilter": {
                    "type": "synonym",
                    "ignore_case": true,
                    "synonyms_path": "test.syn"
                }
            }
        }
        },
    "mappings": {
        "testitem": {
            "properties": {
                "title": {
                    "type": "string",
                    "index_analyzer": "myIndexAnalyzer",
                    "search_analyzer": "mySearchAnalyzer"
                }
            }
        }
    }
}

and added some data:

POST /_bulk
{ "index": { "_index": "testindex", "_type": "testitem", "_id": "1" }}
{ "title":    "aaa test daten eintrag." }
{ "index": { "_index": "testindex", "_type": "testitem", "_id": "2" }}
{ "title":    "bbb test daten eintrag." }
{ "index": { "_index": "testindex", "_type": "testitem", "_id": "3" }}
{ "title":    "eee test daten eintrag." }

Testing the myIndexAnalyzer using

POST /testindex/_analyze?analyzer=myIndexAnalyzer&pretty
{aaa test daten eintrag}

Results to:

{
   "tokens": [
      {
         "token": "aaa",
         "start_offset": 1,
         "end_offset": 4,
         "type": "SYNONYM",
         "position": 1
      },
      {
         "token": "bbb",
         "start_offset": 1,
         "end_offset": 4,
         "type": "SYNONYM",
         "position": 1
      },
      {
         "token": "ccc",
         "start_offset": 1,
         "end_offset": 4,
         "type": "SYNONYM",
         "position": 1
      },
      {
         "token": "ddd",
         "start_offset": 1,
         "end_offset": 4,
         "type": "SYNONYM",
         "position": 1
      },
      {
         "token": "test",
         "start_offset": 5,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

Which to me seems to be fine.

Searching this index, i expected to find Record Ids 1 and 2 if I am searching for "aaa", "bbb", "ccc", "ddd".

Which is my fault??

TIA
Ste Phan


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/64e90076-f905-4490-bfe8-3b1607e5e98a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB6A9nq1GC52sdugQx1%2BM_pJJvdo6ti0ofQYfbOqK6P2A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: I have got a little Problem with my synonym filter ....

Ste Phan
In reply to this post by Ste Phan
I tried multi_match queries.

The little Example seems to work meanwhile ... don't know why?! But my original index has the same problem.

I am posting the synonyms file, so as the create statement.

Analyzing this via:

GET /myindex/_analyze?field=article.authors
{gumbel}

results to:

{
   "tokens": [
      {
         "token": "gumbel",
         "start_offset": 1,
         "end_offset": 7,
         "type": "<ALPHANUM>",
         "position": 1
      }
   ]
}

No synonym results as I would expect. Searching for a synonym "gumble" for example gives no results.

I would be glad to hear from you ... TIA

Ste Phan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0aa73333-56ce-40af-a5e5-1f7673ff60a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

myindex-create.txt (3K) Download Attachment
myindex.syn (217 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: I have got a little Problem with my synonym filter ....

Ste Phan
Ok, I found my error ... the structure of the index definition was wrong ... sorry.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fea8770-154c-4815-be1b-e2ed397a4944%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.