Quantcast

analyzer randomly applied

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

analyzer randomly applied

Chris K Wensel
hey all

when I create an index, I register an analyzer to use with a 'tags' field named 'csv', below.

    settingsBuilder.put( "index.analysis.analyzer.csv.type", "pattern" );
    settingsBuilder.put( "index.analysis.analyzer.csv.pattern", "," );

thus, stuffing "a,b,c" into a 'tags' field and making a facet query returns "a","b","c".

which is exactly what I want.

Except, if the values are "a-b,a-b,a-c", the values are tokenized against both "," and "-"., return on a facet query gives "a", "b", "c". not "a-b", etc..

But not always!

If i run a test to stuff a single document and then run a facet query, sometimes the "-" isn't tokenized on, and sometimes it is. I would say 30% of the time the "-" gets parsed out.

I've tried the following as well, and get the same random results

    settingsBuilder.put( "index.analysis.analyzer.csv.type", "custom" );
    settingsBuilder.put( "index.analysis.analyzer.csv.tokenizer", "csvPattern" );
    settingsBuilder.put( "index.analysis.analyzer.csv.filter", "lowercase" );

    settingsBuilder.put( "index.analysis.tokenizer.csvPattern.type", "pattern" );
    settingsBuilder.put( "index.analysis.tokenizer.csvPattern.pattern", "," );

FWIW, my mapping of 'tags' to 'csv' does work, just not _consistently_ across invocations of the test. I'm using a dynamic template, defined here

{
template_tags: {
mapping: {
store: yes
analyzer: csv
type: string
}
match: tags
}
}

thoughts?

--
Chris K Wensel
[hidden email]
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: analyzer randomly applied

Martijn v Groningen
Hi Chris,

Not sure why this happens. Maybe your mapping isn't applied on all indices?
What you're doing should work, the field value for the field tags should be tokenised by `,`. I created the following gist:

Can you try if your issue still occurs if you perform the indexing / searching the same way I do in this gist? 
(I used ES version 0.20.4)

Martijn

On 10 February 2013 04:10, Chris K Wensel <[hidden email]> wrote:
hey all

when I create an index, I register an analyzer to use with a 'tags' field named 'csv', below.

    settingsBuilder.put( "index.analysis.analyzer.csv.type", "pattern" );
    settingsBuilder.put( "index.analysis.analyzer.csv.pattern", "," );

thus, stuffing "a,b,c" into a 'tags' field and making a facet query returns "a","b","c".

which is exactly what I want.

Except, if the values are "a-b,a-b,a-c", the values are tokenized against both "," and "-"., return on a facet query gives "a", "b", "c". not "a-b", etc..

But not always!

If i run a test to stuff a single document and then run a facet query, sometimes the "-" isn't tokenized on, and sometimes it is. I would say 30% of the time the "-" gets parsed out.

I've tried the following as well, and get the same random results

    settingsBuilder.put( "index.analysis.analyzer.csv.type", "custom" );
    settingsBuilder.put( "index.analysis.analyzer.csv.tokenizer", "csvPattern" );
    settingsBuilder.put( "index.analysis.analyzer.csv.filter", "lowercase" );

    settingsBuilder.put( "index.analysis.tokenizer.csvPattern.type", "pattern" );
    settingsBuilder.put( "index.analysis.tokenizer.csvPattern.pattern", "," );

FWIW, my mapping of 'tags' to 'csv' does work, just not _consistently_ across invocations of the test. I'm using a dynamic template, defined here

{
template_tags: {
mapping: {
store: yes
analyzer: csv
type: string
}
match: tags
}
}

thoughts?

--
Chris K Wensel
[hidden email]
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.





--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: analyzer randomly applied

Chris K Wensel
Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally. 

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: analyzer randomly applied

Chris K Wensel

ok, this does repro the problem


note, i had it fail if using doc id 1, and also using a $RANDOM doc id (the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}


Less frequently it does not

{"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel <[hidden email]> wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally. 

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: analyzer randomly applied

simonw-2
Hey Chris,

this gist doesn't reproduce for me on the current master neither on 0.20.4 for me. What I think can happen here is that the template is not applied when the tags index is created, this would explain what you see. Apparently the "wrong" analyzer is consistently applied to all the documents. Can you try to get this to fail again and if it fails pull the mapping from the ES instance you run this against? -> curl -XGET 'http://localhost:9200/tag/_mapping'
I'd be interested if the index gets created and the template is not applied to it. Maybe there is a race in the template creation code. I try to come up with a testcase for this and stress it a little next week.

simon
On Friday, February 15, 2013 8:14:36 PM UTC+1, Chris K Wensel wrote:

ok, this does repro the problem


note, i had it fail if using doc id 1, and also using a $RANDOM doc id (the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}


Less frequently it does not

{"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="FC0Iy4p13NQJ">ch...@...> wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally. 

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw

--
Chris K Wensel
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="FC0Iy4p13NQJ">ch...@...


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="FC0Iy4p13NQJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
Chris K Wensel
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="FC0Iy4p13NQJ">ch...@...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: analyzer randomly applied

Chris K Wensel

from the data that produces the last case, it is indeed missing
{"error":"IndexMissingException[[tag] missing]","status":404}

that said, per my original email, it is not missing, when I see the test failures, i've double checked the mappings existence, further, not all documents (tags) are mis-parsed.

i'll try and dig deeper into the es code at some point.

ckw

On Feb 16, 2013, at 7:00 AM, simonw <[hidden email]> wrote:

Hey Chris,

this gist doesn't reproduce for me on the current master neither on 0.20.4 for me. What I think can happen here is that the template is not applied when the tags index is created, this would explain what you see. Apparently the "wrong" analyzer is consistently applied to all the documents. Can you try to get this to fail again and if it fails pull the mapping from the ES instance you run this against? -> curl -XGET 'http://localhost:9200/tag/_mapping'
I'd be interested if the index gets created and the template is not applied to it. Maybe there is a race in the template creation code. I try to come up with a testcase for this and stress it a little next week.

simon
On Friday, February 15, 2013 8:14:36 PM UTC+1, Chris K Wensel wrote:

ok, this does repro the problem


note, i had it fail if using doc id 1, and also using a $RANDOM doc id (the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}


Less frequently it does not

{"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="FC0Iy4p13NQJ">ch...@...> wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally. 

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw

--
Chris K Wensel
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="FC0Iy4p13NQJ">ch...@...


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="FC0Iy4p13NQJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
Chris K Wensel
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="FC0Iy4p13NQJ">ch...@...


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: analyzer randomly applied

simonw-2
hmm I think my like was broken it should be 'tags' not 'tag' given the gist, right?

simon

On Sunday, February 17, 2013 2:23:49 AM UTC+1, Chris K Wensel wrote:

from the data that produces the last case, it is indeed missing
{"error":"IndexMissingException[[tag] missing]","status":404}

that said, per my original email, it is not missing, when I see the test failures, i've double checked the mappings existence, further, not all documents (tags) are mis-parsed.

i'll try and dig deeper into the es code at some point.

ckw

On Feb 16, 2013, at 7:00 AM, simonw <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="beDCyequCPkJ">simon.w...@elasticsearch.com> wrote:

Hey Chris,

this gist doesn't reproduce for me on the current master neither on 0.20.4 for me. What I think can happen here is that the template is not applied when the tags index is created, this would explain what you see. Apparently the "wrong" analyzer is consistently applied to all the documents. Can you try to get this to fail again and if it fails pull the mapping from the ES instance you run this against? -> curl -XGET 'http://localhost:9200/tag/_mapping'
I'd be interested if the index gets created and the template is not applied to it. Maybe there is a race in the template creation code. I try to come up with a testcase for this and stress it a little next week.

simon
On Friday, February 15, 2013 8:14:36 PM UTC+1, Chris K Wensel wrote:

ok, this does repro the problem


note, i had it fail if using doc id 1, and also using a $RANDOM doc id (the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}


Less frequently it does not

{"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel <[hidden email]> wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally. 

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="beDCyequCPkJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
Chris K Wensel
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="beDCyequCPkJ">ch...@...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: analyzer randomly applied

Chris K Wensel
oops, crap, and I wiped the data for that.

i added set -e to make sure the server was fully up and i'm not reproducing the problem now via the bash

i'll see if I can get a replay of the calls during our tests and try to reproduce independently of the test harness.

ckw

On Feb 17, 2013, at 10:35 AM, simonw <[hidden email]> wrote:

hmm I think my like was broken it should be 'tags' not 'tag' given the gist, right?

simon

On Sunday, February 17, 2013 2:23:49 AM UTC+1, Chris K Wensel wrote:

from the data that produces the last case, it is indeed missing
{"error":"IndexMissingException[[tag] missing]","status":404}

that said, per my original email, it is not missing, when I see the test failures, i've double checked the mappings existence, further, not all documents (tags) are mis-parsed.

i'll try and dig deeper into the es code at some point.

ckw

On Feb 16, 2013, at 7:00 AM, simonw <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="beDCyequCPkJ">simon.w...@elasticsearch.com> wrote:

Hey Chris,

this gist doesn't reproduce for me on the current master neither on 0.20.4 for me. What I think can happen here is that the template is not applied when the tags index is created, this would explain what you see. Apparently the "wrong" analyzer is consistently applied to all the documents. Can you try to get this to fail again and if it fails pull the mapping from the ES instance you run this against? -> curl -XGET 'http://localhost:9200/tag/_mapping'
I'd be interested if the index gets created and the template is not applied to it. Maybe there is a race in the template creation code. I try to come up with a testcase for this and stress it a little next week.

simon
On Friday, February 15, 2013 8:14:36 PM UTC+1, Chris K Wensel wrote:

ok, this does repro the problem


note, i had it fail if using doc id 1, and also using a $RANDOM doc id (the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}


Less frequently it does not

{"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source" : {
  "tags" : "a-b,a-b,a-c"
 }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel <[hidden email]> wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally. 

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="beDCyequCPkJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
Chris K Wensel
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="beDCyequCPkJ">ch...@...


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Loading...