duplicate documents in query,

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

duplicate documents in query,

Georgi Ivanov

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
 
"query": {
   
"bool": {
     
"must": [
       
{
         
"range": {
           
"ts": {
             
"gte": "2011-08-30T00:00:00Z",
             
"lte": "2011-08-31T23:59:00Z"
           
}
         
}
       
},
       
{
         
"term": {
           
"entity_id": {
             
"value": "298082"
           
}
         
}
       
}
     
]
   
}
 
}
 
,
 
"sort": [
   
{
     
"ts": {
       
"order": "asc"
     
}
   
}
 
],
 
"size": 90
 
}



Result (there are more, just showing duplicates):
{
           
"_index": "track_201108",
           
"_type": "position",
           
"_id": "298082_1314758608000_1302",
           
"_score": null,
           
"_source": {
               
"ts": 1314758608000,
               
"entity_id": 298082,
               
"loc": {
                 
"type": "point",
                 
"coordinates": [
                     
103.694783333,
                     
1.23463333333
                 
]
               
}
           
},
           
"sort": [
               
1314758608000
           
]
         
},
         
{
           
"_index": "track_201108",
           
"_type": "position",
           
"_id": "298082_1314758608000_1302",
           
"_score": null,
           
"_source": {
               
"ts": 1314758608000,
               
"entity_id": 298082,
               
"loc": {
                 
"type": "point",
                 
"coordinates": [
                     
103.694783333,
                     
1.23463333333
                 
]
               
}
           
},
           
"sort": [
               
1314758608000
           
]
         
}



But if i get the document :

curl -s  es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp
{
   
"found" : true,
   
"_version" : 1,
   
"_type" : "position",
   
"_index" : "track_201108",
   
"_source" : {
     
"hourly" : false,
     
"loc" : {
         
"type" : "point",
         
"coordinates" : [
           
103.694783333,
           
1.23463333333
         
]
     
},
     
"ts" : 1314758608000,
     
"entity_id" : 298082
   
},
   
"_id" : "298082_1314758608000_1302"
}




So i have only one document (and it was never updated as version is 1 ).

I don't  understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abf4a5a9-495f-4480-b326-0d9562c696b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: duplicate documents in query,

dadoonet
Which elasticsearch version have you?

-- 
David Pilato - Developer | Evangelist 





Le 29 avr. 2015 à 16:44, Georgi Ivanov <[hidden email]> a écrit :


Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "ts": {
              "gte": "2011-08-30T00:00:00Z",
              "lte": "2011-08-31T23:59:00Z"
            }
          }
        },
        {
          "term": {
            "entity_id": {
              "value": "298082"
            }
          }
        }
      ]
    }
  }
  ,
  "sort": [
    {
      "ts": {
        "order": "asc"
      }
    }
  ],
  "size": 90
 
}



Result (there are more, just showing duplicates):
{
            "_index": "track_201108",
            "_type": "position",
            "_id": "298082_1314758608000_1302",
            "_score": null,
            "_source": {
               "ts": 1314758608000,
               "entity_id": 298082,
               "loc": {
                  "type": "point",
                  "coordinates": [
                     103.694783333,
                     1.23463333333
                  ]
               }
            },
            "sort": [
               1314758608000
            ]
         },
         {
            "_index": "track_201108",
            "_type": "position",
            "_id": "298082_1314758608000_1302",
            "_score": null,
            "_source": {
               "ts": 1314758608000,
               "entity_id": 298082,
               "loc": {
                  "type": "point",
                  "coordinates": [
                     103.694783333,
                     1.23463333333
                  ]
               }
            },
            "sort": [
               1314758608000
            ]
         }



But if i get the document :

curl -s  es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp
{
   "found" : true,
   "_version" : 1,
   "_type" : "position",
   "_index" : "track_201108",
   "_source" : {
      "hourly" : false,
      "loc" : {
         "type" : "point",
         "coordinates" : [
            103.694783333,
            1.23463333333
         ]
      },
      "ts" : 1314758608000,
      "entity_id" : 298082
   },
   "_id" : "298082_1314758608000_1302"
}




So i have only one document (and it was never updated as version is 1 ).

I don't  understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abf4a5a9-495f-4480-b326-0d9562c696b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/78353643-37DD-43E2-9D74-19D04AE1B081%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: duplicate documents in query,

Georgi Ivanov
In reply to this post by Georgi Ivanov
1.5.2

On Wednesday, April 29, 2015 at 4:44:09 PM UTC+2, Georgi Ivanov wrote:

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
 
"query": {
   
"bool": {
     
"must": [
       
{
         
"range": {
           
"ts": {
             
"gte": "2011-08-30T00:00:00Z",
             
"lte": "2011-08-31T23:59:00Z"
           
}
         
}
       
},
       
{
         
"term": {
           
"entity_id": {
             
"value": "298082"
           
}
         
}
       
}
     
]
   
}
 
}
 
,
 
"sort": [
   
{
     
"ts": {
       
"order": "asc"
     
}
   
}
 
],
 
"size": 90
 
}



Result (there are more, just showing duplicates):
{
           
"_index": "track_201108",
           
"_type": "position",
           
"_id": "298082_1314758608000_1302",
           
"_score": null,
           
"_source": {
               
"ts": 1314758608000,
               
"entity_id": 298082,
               
"loc": {
                 
"type": "point",
                 
"coordinates": [
                     
103.694783333,
                     
1.23463333333
                 
]
               
}
           
},
           
"sort": [
               
1314758608000
           
]
         
},
         
{
           
"_index": "track_201108",
           
"_type": "position",
           
"_id": "298082_1314758608000_1302",
           
"_score": null,
           
"_source": {
               
"ts": 1314758608000,
               
"entity_id": 298082,
               
"loc": {
                 
"type": "point",
                 
"coordinates": [
                     
103.694783333,
                     
1.23463333333
                 
]
               
}
           
},
           
"sort": [
               
1314758608000
           
]
         
}



But if i get the document :

curl -s  es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp
{
   
"found" : true,
   
"_version" : 1,
   
"_type" : "position",
   
"_index" : "track_201108",
   
"_source" : {
     
"hourly" : false,
     
"loc" : {
         
"type" : "point",
         
"coordinates" : [
           
103.694783333,
           
1.23463333333
         
]
     
},
     
"ts" : 1314758608000,
     
"entity_id" : 298082
   
},
   
"_id" : "298082_1314758608000_1302"
}




So i have only one document (and it was never updated as version is 1 ).

I don't  understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: duplicate documents in query,

dadoonet
What do you have with: curl -XGET 'http://localhost:9200/track_2011*/'



-- 
David Pilato - Developer | Evangelist 





Le 29 avr. 2015 à 17:44, Georgi Ivanov <[hidden email]> a écrit :

1.5.2

On Wednesday, April 29, 2015 at 4:44:09 PM UTC+2, Georgi Ivanov wrote:

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "ts": {
              "gte": "2011-08-30T00:00:00Z",
              "lte": "2011-08-31T23:59:00Z"
            }
          }
        },
        {
          "term": {
            "entity_id": {
              "value": "298082"
            }
          }
        }
      ]
    }
  }
  ,
  "sort": [
    {
      "ts": {
        "order": "asc"
      }
    }
  ],
  "size": 90
 
}



Result (there are more, just showing duplicates):
{
            "_index": "track_201108",
            "_type": "position",
            "_id": "298082_1314758608000_1302",
            "_score": null,
            "_source": {
               "ts": 1314758608000,
               "entity_id": 298082,
               "loc": {
                  "type": "point",
                  "coordinates": [
                     103.694783333,
                     1.23463333333
                  ]
               }
            },
            "sort": [
               1314758608000
            ]
         },
         {
            "_index": "track_201108",
            "_type": "position",
            "_id": "298082_1314758608000_1302",
            "_score": null,
            "_source": {
               "ts": 1314758608000,
               "entity_id": 298082,
               "loc": {
                  "type": "point",
                  "coordinates": [
                     103.694783333,
                     1.23463333333
                  ]
               }
            },
            "sort": [
               1314758608000
            ]
         }



But if i get the document :

curl -s  es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp
{
   "found" : true,
   "_version" : 1,
   "_type" : "position",
   "_index" : "track_201108",
   "_source" : {
      "hourly" : false,
      "loc" : {
         "type" : "point",
         "coordinates" : [
            103.694783333,
            1.23463333333
         ]
      },
      "ts" : 1314758608000,
      "entity_id" : 298082
   },
   "_id" : "298082_1314758608000_1302"
}




So i have only one document (and it was never updated as version is 1 ).

I don't  understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1134D43C-9311-4D07-96DD-2F79DE201F58%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: duplicate documents in query,

dadoonet
Also could you try 
curl -XGET 'localhost:9200/twitter/_search_shards'



And then search using 
?preference=_shards:0,primary
?preference=_shards:1,primary
?preference=_shards:2,primary

And so on…

Try to locate on which shard you have the duplicates.

Are your sure you never used a routing key when indexing one of your docs?

-- 
David Pilato - Developer | Evangelist 





Le 29 avr. 2015 à 17:58, David Pilato <[hidden email]> a écrit :

What do you have with: curl -XGET 'http://localhost:9200/track_2011*/'



-- 
David Pilato - Developer | Evangelist 





Le 29 avr. 2015 à 17:44, Georgi Ivanov <[hidden email]> a écrit :

1.5.2

On Wednesday, April 29, 2015 at 4:44:09 PM UTC+2, Georgi Ivanov wrote:

Hi ,
I have some strange issue .
I get duplicate documents when querying:

GET track_2011*/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "ts": {
              "gte": "2011-08-30T00:00:00Z",
              "lte": "2011-08-31T23:59:00Z"
            }
          }
        },
        {
          "term": {
            "entity_id": {
              "value": "298082"
            }
          }
        }
      ]
    }
  }
  ,
  "sort": [
    {
      "ts": {
        "order": "asc"
      }
    }
  ],
  "size": 90
 
}



Result (there are more, just showing duplicates):
{
            "_index": "track_201108",
            "_type": "position",
            "_id": "298082_1314758608000_1302",
            "_score": null,
            "_source": {
               "ts": 1314758608000,
               "entity_id": 298082,
               "loc": {
                  "type": "point",
                  "coordinates": [
                     103.694783333,
                     1.23463333333
                  ]
               }
            },
            "sort": [
               1314758608000
            ]
         },
         {
            "_index": "track_201108",
            "_type": "position",
            "_id": "298082_1314758608000_1302",
            "_score": null,
            "_source": {
               "ts": 1314758608000,
               "entity_id": 298082,
               "loc": {
                  "type": "point",
                  "coordinates": [
                     103.694783333,
                     1.23463333333
                  ]
               }
            },
            "sort": [
               1314758608000
            ]
         }



But if i get the document :

curl -s  es01.host.com:9200/track_201108/position/298082_1314758608000_1302 | json_pp
{
   "found" : true,
   "_version" : 1,
   "_type" : "position",
   "_index" : "track_201108",
   "_source" : {
      "hourly" : false,
      "loc" : {
         "type" : "point",
         "coordinates" : [
            103.694783333,
            1.23463333333
         ]
      },
      "ts" : 1314758608000,
      "entity_id" : 298082
   },
   "_id" : "298082_1314758608000_1302"
}




So i have only one document (and it was never updated as version is 1 ).

I don't  understand what is going on here .

No special routing, no parent/child relations.

Any ideas ?

-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e7a3514-bc91-414b-a88f-fe93c17df7ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1134D43C-9311-4D07-96DD-2F79DE201F58%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ED754610-CD34-47C1-AB33-E63A8835B3D5%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.