Quantcast

Spatial Query not searching both indexes

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Spatial Query not searching both indexes

Dave O
 I have a query for test purposes where I've created 2 indexes both with the same exact data. 
When I query each index individually I get the desired results....When I query both indexes together I only get the results from geoondex1....
I was thinking maybe duplicates were weeded out, however, I then proceeded to delete some documents in geoindex1 hoping they would show now in geoindex2 but that doesn't happen. 
I tried also just doing geoindex*/geo.... and that doesn't work either.

Not sure if this is a bug or user error?  Any help would be appreciated. 

curl -XGET 'http://localhost:9200/geoindex1,geoindex2/geo/_search?pretty=true' -d '
{
  "query": {
    "filtered" : {
        "query" : {
            "match_all" : {}
        },
        "filter" : {
            "geo_distance" : {
                "distance" : "500km",
                "location" : {
                    "lat" : 45.59174,
                    "lon" : 11.4050
                }
            }
        }
    }
  }
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spatial Query not searching both indexes

Dave O
An update to this.  When I used aliases I get the same results.  It seems to only query the index that was created first in the list.  So even if I put geoindex2,geoindex1....or  geoindex4,geoindex2....It pulls the lowered number index (Which I created first in my test).

If I run a NON-Spatial query for all indexes using either hard-coded indexes  geoindex1,geoindex2,geoindex3.... or using an alias referncing all indexes the query DOES work.  So it appears just related to spatial??

Thanks

On Monday, February 25, 2013 12:24:51 AM UTC-5, Dave O wrote:
 I have a query for test purposes where I've created 2 indexes both with the same exact data. 
When I query each index individually I get the desired results....When I query both indexes together I only get the results from geoondex1....
I was thinking maybe duplicates were weeded out, however, I then proceeded to delete some documents in geoindex1 hoping they would show now in geoindex2 but that doesn't happen. 
I tried also just doing geoindex*/geo.... and that doesn't work either.

Not sure if this is a bug or user error?  Any help would be appreciated. 

curl -XGET 'http://localhost:9200/geoindex1,geoindex2/geo/_search?pretty=true' -d '
{
  "query": {
    "filtered" : {
        "query" : {
            "match_all" : {}
        },
        "filter" : {
            "geo_distance" : {
                "distance" : "500km",
                "location" : {
                    "lat" : 45.59174,
                    "lon" : 11.4050
                }
            }
        }
    }
  }
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spatial Query not searching both indexes

Dave O
In reply to this post by Dave O
An update to this.  When I used aliases I get the same results.  It seems to only query the index that was created first.  So even if I put geoindex2,geoindex1....or  geoindex4,geoindex2....It pulls the lowered number index (Which I created first in my test).

If I run a NON-Spatial query for all indexes using either hard-coded indexes  geoindex1,geoindex2,geoindex3.
... or using an alias referncing all indexes the query DOES work.  So it appears just related to spatial??

Thanks


On Monday, February 25, 2013 12:24:51 AM UTC-5, Dave O wrote:
 I have a query for test purposes where I've created 2 indexes both with the same exact data. 
When I query each index individually I get the desired results....When I query both indexes together I only get the results from geoondex1....
I was thinking maybe duplicates were weeded out, however, I then proceeded to delete some documents in geoindex1 hoping they would show now in geoindex2 but that doesn't happen. 
I tried also just doing geoindex*/geo.... and that doesn't work either.

Not sure if this is a bug or user error?  Any help would be appreciated. 

curl -XGET 'http://localhost:9200/geoindex1,geoindex2/geo/_search?pretty=true' -d '
{
  "query": {
    "filtered" : {
        "query" : {
            "match_all" : {}
        },
        "filter" : {
            "geo_distance" : {
                "distance" : "500km",
                "location" : {
                    "lat" : 45.59174,
                    "lon" : 11.4050
                }
            }
        }
    }
  }
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spatial Query not searching both indexes

Dave O
In reply to this post by Dave O

Please see my testcase on github where I re-created the problem.  Any help would be greatly appreciated.
I am running elasticsearch 0.20.4 on a single instance (laptop) using virtual box running a centos 6.3 OS.

https://gist.github.com/oakstream/5033131

Thank you


On Monday, February 25, 2013 12:24:51 AM UTC-5, Dave O wrote:
 I have a query for test purposes where I've created 2 indexes both with the same exact data. 
When I query each index individually I get the desired results....When I query both indexes together I only get the results from geoondex1....
I was thinking maybe duplicates were weeded out, however, I then proceeded to delete some documents in geoindex1 hoping they would show now in geoindex2 but that doesn't happen. 
I tried also just doing geoindex*/geo.... and that doesn't work either.

Not sure if this is a bug or user error?  Any help would be appreciated. 

curl -XGET 'http://localhost:9200/geoindex1,geoindex2/geo/_search?pretty=true' -d '
{
  "query": {
    "filtered" : {
        "query" : {
            "match_all" : {}
        },
        "filter" : {
            "geo_distance" : {
                "distance" : "500km",
                "location" : {
                    "lat" : 45.59174,
                    "lon" : 11.4050
                }
            }
        }
    }
  }
}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spatial Query not searching both indexes

Clinton Gormley-2
Hiya

> Please see my testcase on github where I re-created the problem.  Any
> help would be greatly appreciated.
> I am running elasticsearch 0.20.4 on a single instance (laptop) using
> virtual box running a centos 6.3 OS.

Your term query for country 'AD' won't work because "country" is defined
as a field of { type: "string" }, which means that it is "analyzed",
which means that "AD" will be indexed as "ad".  But you are searching
for the EXACT term "AD", so it won't be found.

Set the country field to { type: "string", index: "not_analyzed" }

Then as far as why results are not being returned from geoindex11/12, I
think you're just getting the first 10 results which happen to be in
geoindex10.

Try setting { size: 100} in your query, to see more results

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spatial Query not searching both indexes

Dave O
Hi Clinton, Great!  Yes this worked beautiful in both cases. I have some learning to do!   Making the move from the RDBMS to ES.  Great product.   Good work on the Perl client also. I've used it a little bit for my client stuff.  Just trying to get familiar with the technology first before I get too much into the API's. 

Issue Resolved!

Mike

On Monday, February 25, 2013 4:01:12 PM UTC-5, Clinton Gormley wrote:
Hiya

> Please see my testcase on github where I re-created the problem.  Any
> help would be greatly appreciated.
> I am running elasticsearch 0.20.4 on a single instance (laptop) using
> virtual box running a centos 6.3 OS.

Your term query for country 'AD' won't work because "country" is defined
as a field of { type: "string" }, which means that it is "analyzed",
which means that "AD" will be indexed as "ad".  But you are searching
for the EXACT term "AD", so it won't be found.

Set the country field to { type: "string", index: "not_analyzed" }

Then as far as why results are not being returned from geoindex11/12, I
think you're just getting the first 10 results which happen to be in
geoindex10.

Try setting { size: 100} in your query, to see more results

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spatial Query not searching both indexes

Dave O
In reply to this post by Clinton Gormley-2
Hi Clint,
I had one observation that was hoping you could help. Not sure if this is just my laptop PC or normal overheads (which it appears to be)  but basically I have 4 indexes each with about 9 million records (the same exact records in each index).  I just wanted to do some benchmarking/performing testing to see how querying multiple indexes would perform.

I'm noticing about half a second for one index and then it slowly increases up to about 1.3 seconds when adding all 4.  I'm doing the basic polygon feature in my test case, no aggregates or anything.  I would expect a slight increase as indexes are added and this increase is very minor so I'm okay with it and this seems like it could be normal. (especially on my Laptop VM)

I was just wondering how parallelization or distribution works when querying multiple indexes (or a single index with many shards).   I reviewed the documentation on parallel processes but was a little unclear.

If I have a query that queries 4 indexes do 4 processes get created. (or 4 threads).  Would each one use a separate CPU if I have 4 CPU's.   Is there a way to tune this?     As an example.  Lets say I have 20 indexes that I need to query at one time (all on the same box/node).  20  processes may overload the system.   Can the process count be controlled?    Or do I just need to know the limitations of the system and that maybe it can only handle 10 processes??

If I have 20 servers each with 1 index and I query them all would this behave any differently then 1 server with 20 indexes (as it relates to the number of processes created)?

My hopes are that when I issue a query against say 10 indexes....that 10 processes get created simultaneously....each process queries it's targeted index and then returns to aggregate results and back to user.....rather then 10 processes running serially. 

Thank you





On Monday, February 25, 2013 4:01:12 PM UTC-5, Clinton Gormley wrote:
Hiya

> Please see my testcase on github where I re-created the problem.  Any
> help would be greatly appreciated.
> I am running elasticsearch 0.20.4 on a single instance (laptop) using
> virtual box running a centos 6.3 OS.

Your term query for country 'AD' won't work because "country" is defined
as a field of { type: "string" }, which means that it is "analyzed",
which means that "AD" will be indexed as "ad".  But you are searching
for the EXACT term "AD", so it won't be found.

Set the country field to { type: "string", index: "not_analyzed" }

Then as far as why results are not being returned from geoindex11/12, I
think you're just getting the first 10 results which happen to be in
geoindex10.

Try setting { size: 100} in your query, to see more results

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spatial Query not searching both indexes

Clinton Gormley-2
Hi Dave


> I'm noticing about half a second for one index and then it slowly
> increases up to about 1.3 seconds when adding all 4.  I'm doing the
> basic polygon feature in my test case, no aggregates or anything.  I
> would expect a slight increase as indexes are added and this increase
> is very minor so I'm okay with it and this seems like it could be
> normal. (especially on my Laptop VM)

How much ES_HEAP_SIZE are you giving to ES?  And how much RAM are you
leaving to file system caches?

For geo calculations, the field value for every document in every index
being queries needs to be loaded into memory. Once loaded, that "cache"
isn't evicted by default, to speed up future queries.

But it can take up a lot of RAM, esp if you have small heap sizes.
>
> I was just wondering how parallelization or distribution works when
> querying multiple indexes (or a single index with many shards).   I
> reviewed the documentation on parallel processes but was a little
> unclear.

Each shard is queried in parallel. So if you have 1 index with 5 shards,
or 5 indices with 1 shard, both would result in querying 5 shards in
parallel.

However, a single shard is concurrent and can make full use of the
resources of a single node, so hosting many shards on a single node
doesn't buy you concurrency.  In fact, it'd probably slow things down a
bit, as there would be more context switches.

As you add nodes, shards are redistributed to them, spreading the load
and giving each shard access to more resources.
>
> If I have a query that queries 4 indexes do 4 processes get created.
> (or 4 threads).  Would each one use a separate CPU if I have 4 CPU's.
> Is there a way to tune this?    

You can specify the max number of threads per node:
http://www.elasticsearch.org/guide/reference/modules/threadpool.html

>  As an example.  Lets say I have 20 indexes that I need to query at
> one time (all on the same box/node).  20  processes may overload the
> system.   Can the process count be controlled?    Or do I just need to
> know the limitations of the system and that maybe it can only handle
> 10 processes??
>
> If I have 20 servers each with 1 index and I query them all would this
> behave any differently then 1 server with 20 indexes (as it relates to
> the number of processes created)?
>
> My hopes are that when I issue a query against say 10 indexes....that
> 10 processes get created simultaneously....each process queries it's
> targeted index and then returns to aggregate results and back to
> user.....rather then 10 processes running serially.  

That's pretty much how it happens.

clint

>
>
>
>
>
> On Monday, February 25, 2013 4:01:12 PM UTC-5, Clinton Gormley wrote:
>         Hiya
>        
>         > Please see my testcase on github where I re-created the
>         problem.  Any
>         > help would be greatly appreciated.
>         > I am running elasticsearch 0.20.4 on a single instance
>         (laptop) using
>         > virtual box running a centos 6.3 OS.
>        
>         Your term query for country 'AD' won't work because "country"
>         is defined
>         as a field of { type: "string" }, which means that it is
>         "analyzed",
>         which means that "AD" will be indexed as "ad".  But you are
>         searching
>         for the EXACT term "AD", so it won't be found.
>        
>         Set the country field to { type: "string", index:
>         "not_analyzed" }
>        
>         Then as far as why results are not being returned from
>         geoindex11/12, I
>         think you're just getting the first 10 results which happen to
>         be in
>         geoindex10.
>        
>         Try setting { size: 100} in your query, to see more results
>        
>         clint
>        
>        
>
> --
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [hidden email].
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.



Loading...