geo_bounding_box on 10 billion objects

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

geo_bounding_box on 10 billion objects

Mohamed Lrhazi
Hello,

How much resources would it take, and what topology/config would I need to use, to provide for this, if at all possible:

Index 10 billion simple objects with the following schema, and provide subsecond geo_bounding_box search:

item:
  location: (lon, lat)
  color: small_string

I have been doing some tests and I can easily get sub second searches with 10 million objects (am only interested in the hits number BTW), on single instance, 32GB RAM, 2 vCPUs, and elasticsearch started with options -Xmx30g -Xms30g

How many nodes, how much RAM each, would you expect to need to reach such a goal? What would I need to learn and tweak or change from elasticsearch default settings for sharding, indexing and searching?

Or is elasticsearch the wrong tool for such a task? would mongodb be more appropriate? given the size of the data set?

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: geo_bounding_box on 10 billion objects

kimchy
Administrator
Hard to tell on how many nodes you will need to run it. Geo bounding box is CPU heavy, the more nodes you have, the better your search performance will be (and make sure you have enough shards).

Side note: We don't recommend setting the ES_HEAP_SIZE to 30gb on a 32gb machine, you want to keep a bit for the OS file system cache. Since geo bounding box requires loading the lat/lon fields to be loaded to memory, you will need memory for it, but I wouldn't pass the 22gb mark on a 32gb box.

On Feb 11, 2013, at 9:41 PM, Mohamed Lrhazi <[hidden email]> wrote:

Hello,

How much resources would it take, and what topology/config would I need to use, to provide for this, if at all possible:

Index 10 billion simple objects with the following schema, and provide subsecond geo_bounding_box search:

item:
  location: (lon, lat)
  color: small_string

I have been doing some tests and I can easily get sub second searches with 10 million objects (am only interested in the hits number BTW), on single instance, 32GB RAM, 2 vCPUs, and elasticsearch started with options -Xmx30g -Xms30g

How many nodes, how much RAM each, would you expect to need to reach such a goal? What would I need to learn and tweak or change from elasticsearch default settings for sharding, indexing and searching?

Or is elasticsearch the wrong tool for such a task? would mongodb be more appropriate? given the size of the data set?

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: geo_bounding_box on 10 billion objects

Mohamed Lrhazi
Thanks a lot Kimchy. this might require more nodes than I could afford for now :)

On Tuesday, February 12, 2013 5:52:15 PM UTC-5, kimchy wrote:
Hard to tell on how many nodes you will need to run it. Geo bounding box is CPU heavy, the more nodes you have, the better your search performance will be (and make sure you have enough shards).

Side note: We don't recommend setting the ES_HEAP_SIZE to 30gb on a 32gb machine, you want to keep a bit for the OS file system cache. Since geo bounding box requires loading the lat/lon fields to be loaded to memory, you will need memory for it, but I wouldn't pass the 22gb mark on a 32gb box.

On Feb 11, 2013, at 9:41 PM, Mohamed Lrhazi <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="y7wNQUQ7-TwJ">ml...@...> wrote:

Hello,

How much resources would it take, and what topology/config would I need to use, to provide for this, if at all possible:

Index 10 billion simple objects with the following schema, and provide subsecond geo_bounding_box search:

item:
  location: (lon, lat)
  color: small_string

I have been doing some tests and I can easily get sub second searches with 10 million objects (am only interested in the hits number BTW), on single instance, 32GB RAM, 2 vCPUs, and elasticsearch started with options -Xmx30g -Xms30g

How many nodes, how much RAM each, would you expect to need to reach such a goal? What would I need to learn and tweak or change from elasticsearch default settings for sharding, indexing and searching?

Or is elasticsearch the wrong tool for such a task? would mongodb be more appropriate? given the size of the data set?

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="y7wNQUQ7-TwJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.