Quantcast

understanding scaling & clustering decision making

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

understanding scaling & clustering decision making

Daniel Myung
Hello,

We are in the midst of transitioning much of our data into ES, but have been running into performance issues loading data into our index. 

As recommended in other threads on this group, I've created a separate testbed for understanding the interaction of ES with our data.

Summary question:

* What are the main resource constraints and flags that are indicators of hardware and cluster scaling needs? Is it a max number of my certain type X of documents for a shard and the corresponding disk and memory space its running takes? Our experience below makes it difficult (for me currently) to assess a clear decision making path based upon the behavior I am observing.

Long story:

We have 2 major document types so we've created 2 indices, one for each type.  For our testing to assess the "limits" of our data and ES, we set them to be single shard indices.

On a single node, it's a Rackspace Cloud 8GB machine.
We've set java heap to be 75% of available RAM, so that's 6GB and the machine is assigned 4 CPUs

We're running bulk indexing with the following settings for the single shard indices:

{
    "index": {
        "index.number_of_replicas": "0", 
        "merge.policy.merge_factor": 30, 
        "refresh_interval": "-1", 
        "store.throttle.max_bytes_per_sec": "5mb", 
        "store.throttle.type": "merge"
    }
}


We are simultaneously running bulk operations on the two documents and their respective indices.

On one index, FORM_INDEX,
it's humming along fine, bulk index operations of 500 documents have never deviated over 2-3 seconds for the 500 document POST. They usually are in the 100-500ms range for the 500 document POST bulk indexing operations.

As of this writing, the Index (and lone shard) size is 1.2gb, 300,000 documents - no change in performance. So far so good!


On the other index, CASE_INDEX
It was humming along fine, bulk index operations of 500 documents were in the sub 1000 ms range. However, once it hit 200,000 documents, the POSTs started inching up in time - 10 seconds, 30, 60, 120 seconds.

It was interminably slow for about an hour during this time, but then it inexplicably sped up to < 1 second bulk insert times. It experienced another blowup in insertion times, but now it is back to sub 1500ms insertion times.

the CASE_INDEX stats right now are 7gb index (and lone shard) size, 266,000 documents.

ES is reporting its heap at 943mb, bigdesk 

So initially I'm thinking it's understandable that CASE_INDEX would have some slowdown compared to FORM_INDEX due to the complexity of the mappings we have defined and the indexing we do on it...but I would have expected a more graceful decline in performance showing "you are reaching the end of the line on this shard" - but instead it's rather bursty. 

My main question that I am seeking a decision making heuristic of some sort in understanding when to up the resources on our cluster, or play with our shard count and JVM settings.  Is my desired outcome some magic number limit for num_docs and size for CASE_INDEX and XFORM_INDEX respectively for a given machine RAM size?

I ask this because I ran into a similar precipitous decline in bulk indexed docs per second on our other testbed with 5 shard indices that also magically recovered with no intervention on our end.  This is the same 8GB hardware running CASE_INDEX and FORM_INDEX at 5 shards and 500k and 1 million docs in each respectively. 4gb out of 6gb heap, and a similar bulk indexing operation ran into similar fast+slow bulk insertion rates.

I'm ill equipped to explain and justify throwing more (virtual) hardware to our environment when our single shard setup is exhibiting the same behavior as our presumably overburdened machine.

Thanks,

Dan







--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

joergprante@gmail.com
Hi,

thanks for telling so many details. It is just that your index scenario
CASE_INDEX with 7G size was hit by Lucene merging, I think. Do you
monitor CPU/memory/load of your nodes?

Note the index is bigger than heap memory, and ES must have started up
the Lucene segment merging which involves very heavy on I/O read/write.
This merging is vital, but it can bring systems near to a halt for a
period of time, maybe for seconds, maybe for minutes. Due to the sudden
heavy I/O wait in this step, you can't see a soft decline in
performance. It's an all or nothing. And there is no "magical recovery"
- it is just the end of a large segment merge process. These large
merges are rare, they may happen each 10 or 20 minutes or even hours
after constant bulk indexing runs, depending on your machine setup.

First, your disk input/output subsystem seems too slow for your
requirements, think about faster disk writes. This is the slowest part
of the whole system and you will notice it can help quickly without
changing code.

Second, it also appears that merge.policy.merge_factor 30 may be a bit
too high for your system setup to handle the segment merge gracefully,
maybe you can think about reducing the number to 20, or even 10.

You can't estimate bulk cost only from counting documents or looking at
the mapping. Right, there are a few analyzers known for bad performance,
but this does not include most of the other analyzers, so mapping is in
most cases not critical for overall performance, unless you're doing
really wild things. A better measure for estimating costs of indexing a
document is the size of the existing index, the size of the document,
how frequent terms occur in the index and in the document, and the
maximum number of different/unique words/terms per field. You should
think in "streams" and byte volumes per second you put on the machine.

For improving bulk indexing "smoothness", you can study
org.elasticsearch.acion.bulk.BulkProcessor. It demonstrates how to set
an upper limit of concurrency. If the bulk processor exceeds a certain
concurrency rate, it just waits at client side. This helps to prevent ES
cluster from being overloaded, and the bulk ingest will behave more
predictable.

Jörg

Am 26.02.13 04:28, schrieb Daniel Myung:
> It was interminably slow for about an hour during this time, but then
> it inexplicably sped up to < 1 second bulk insert times. It
> experienced another blowup in insertion times, but now it is back to
> sub 1500ms insertion times.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

Clinton Gormley-2
To add to what Jörg said

> First, your disk input/output subsystem seems too slow for your
> requirements, think about faster disk writes. This is the slowest part
> of the whole system and you will notice it can help quickly without
> changing code.

Merges happen all the time in the background, and usually they are
pretty light (3 small segments get merged into a bigger, but still small
segment).

Unfortunately, when you have 3 big segments getting merged into eg a 5GB
segment, that can use a lot of I/O, which impacts the performance of the
system.

If you can afford SSDs, definitely go for it. Otherwise, you will want
to throttle the speed of merges -- currently they happen as fast as your
system allows.

Take a look at "store level throttling" on
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Also, only leaving 25% of your RAM for the filesystem caches is
insufficient. We recommend 50% of RAM for the ES_HEAP and 50% for the
system.  The file system caches help a lot with disk access speed

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

Daniel Myung
Thanks for all information. I figured it was the merge operation going. In my other tests and during my testing of Sematext's SPM, CPU and total system memory usage were spiking like crazy during the slow times.

What was curious was during my 1 shard set, total OS memory usage was low, hence my confusion.

I initially operated on the 50% RAM setup when I ran into the initial merge operation - so I bumped it to 75% - I'll flip it back to 50% to see if there are changes in its behavior. 

You can't estimate bulk cost only from counting documents or looking at 
the mapping. Right, there are a few analyzers known for bad performance, 
but this does not include most of the other analyzers, so mapping is in 
most cases not critical for overall performance, unless you're doing 
really wild things. A better measure for estimating costs of indexing a 
document is the size of the existing index, the size of the document, 
how frequent terms occur in the index and in the document, and the 
maximum number of different/unique words/terms per field. You should 
think in "streams" and byte volumes per second you put on the machine. 

So, if I read this, for a given index size and document, its impact is more measured in the total I/O throughput it will incur on the index depending on what content of it is being indexed within the index itself.  In my situation, once my index exceeded my heap space, it went to slow disk land and that's where the things got challenging with merges and overall I/O demands for each indexing operation.

So in my indexing _settings I set - i set a 5mb store level throttle, and yet things still were taxing. Is that the indicator that my disks were terrible? That it was set to 5mb and it still crushed my I/O?

Thanks again!
Dan

On Tuesday, February 26, 2013 5:17:37 AM UTC-5, Clinton Gormley wrote:
To add to what Jörg said

> First, your disk input/output subsystem seems too slow for your
> requirements, think about faster disk writes. This is the slowest part
> of the whole system and you will notice it can help quickly without
> changing code.

Merges happen all the time in the background, and usually they are
pretty light (3 small segments get merged into a bigger, but still small
segment).

Unfortunately, when you have 3 big segments getting merged into eg a 5GB
segment, that can use a lot of I/O, which impacts the performance of the
system.

If you can afford SSDs, definitely go for it. Otherwise, you will want
to throttle the speed of merges -- currently they happen as fast as your
system allows.

Take a look at "store level throttling" on
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Also, only leaving 25% of your RAM for the filesystem caches is
insufficient. We recommend 50% of RAM for the ES_HEAP and 50% for the
system.  The file system caches help a lot with disk access speed

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

Ivan Brusic
Does your  5 shard testbed also have the same JVM memory settings (6GB out of 8GB)? The same merge factor? A higher merge factor is great when you have the memory. Once you move out of indexing testing and start searcing against the index (while indexing), the memory pressures will become even more evident.

-- 
Ivan


On Tue, Feb 26, 2013 at 9:44 AM, dmyung <[hidden email]> wrote:
Thanks for all information. I figured it was the merge operation going. In my other tests and during my testing of Sematext's SPM, CPU and total system memory usage were spiking like crazy during the slow times.

What was curious was during my 1 shard set, total OS memory usage was low, hence my confusion.

I initially operated on the 50% RAM setup when I ran into the initial merge operation - so I bumped it to 75% - I'll flip it back to 50% to see if there are changes in its behavior. 

You can't estimate bulk cost only from counting documents or looking at 
the mapping. Right, there are a few analyzers known for bad performance, 
but this does not include most of the other analyzers, so mapping is in 
most cases not critical for overall performance, unless you're doing 
really wild things. A better measure for estimating costs of indexing a 
document is the size of the existing index, the size of the document, 
how frequent terms occur in the index and in the document, and the 
maximum number of different/unique words/terms per field. You should 
think in "streams" and byte volumes per second you put on the machine. 

So, if I read this, for a given index size and document, its impact is more measured in the total I/O throughput it will incur on the index depending on what content of it is being indexed within the index itself.  In my situation, once my index exceeded my heap space, it went to slow disk land and that's where the things got challenging with merges and overall I/O demands for each indexing operation.

So in my indexing _settings I set - i set a 5mb store level throttle, and yet things still were taxing. Is that the indicator that my disks were terrible? That it was set to 5mb and it still crushed my I/O?

Thanks again!
Dan

On Tuesday, February 26, 2013 5:17:37 AM UTC-5, Clinton Gormley wrote:
To add to what Jörg said

> First, your disk input/output subsystem seems too slow for your
> requirements, think about faster disk writes. This is the slowest part
> of the whole system and you will notice it can help quickly without
> changing code.

Merges happen all the time in the background, and usually they are
pretty light (3 small segments get merged into a bigger, but still small
segment).

Unfortunately, when you have 3 big segments getting merged into eg a 5GB
segment, that can use a lot of I/O, which impacts the performance of the
system.

If you can afford SSDs, definitely go for it. Otherwise, you will want
to throttle the speed of merges -- currently they happen as fast as your
system allows.

Take a look at "store level throttling" on
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Also, only leaving 25% of your RAM for the filesystem caches is
insufficient. We recommend 50% of RAM for the ES_HEAP and 50% for the
system.  The file system caches help a lot with disk access speed

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

Daniel Myung
Yes, same 75% RAM of 8GB, and same merge factor.


On Tue, Feb 26, 2013 at 12:48 PM, Ivan Brusic <[hidden email]> wrote:
Does your  5 shard testbed also have the same JVM memory settings (6GB out of 8GB)? The same merge factor? A higher merge factor is great when you have the memory. Once you move out of indexing testing and start searcing against the index (while indexing), the memory pressures will become even more evident.

-- 
Ivan


On Tue, Feb 26, 2013 at 9:44 AM, dmyung <[hidden email]> wrote:
Thanks for all information. I figured it was the merge operation going. In my other tests and during my testing of Sematext's SPM, CPU and total system memory usage were spiking like crazy during the slow times.

What was curious was during my 1 shard set, total OS memory usage was low, hence my confusion.

I initially operated on the 50% RAM setup when I ran into the initial merge operation - so I bumped it to 75% - I'll flip it back to 50% to see if there are changes in its behavior. 

You can't estimate bulk cost only from counting documents or looking at 
the mapping. Right, there are a few analyzers known for bad performance, 
but this does not include most of the other analyzers, so mapping is in 
most cases not critical for overall performance, unless you're doing 
really wild things. A better measure for estimating costs of indexing a 
document is the size of the existing index, the size of the document, 
how frequent terms occur in the index and in the document, and the 
maximum number of different/unique words/terms per field. You should 
think in "streams" and byte volumes per second you put on the machine. 

So, if I read this, for a given index size and document, its impact is more measured in the total I/O throughput it will incur on the index depending on what content of it is being indexed within the index itself.  In my situation, once my index exceeded my heap space, it went to slow disk land and that's where the things got challenging with merges and overall I/O demands for each indexing operation.

So in my indexing _settings I set - i set a 5mb store level throttle, and yet things still were taxing. Is that the indicator that my disks were terrible? That it was set to 5mb and it still crushed my I/O?

Thanks again!
Dan

On Tuesday, February 26, 2013 5:17:37 AM UTC-5, Clinton Gormley wrote:
To add to what Jörg said

> First, your disk input/output subsystem seems too slow for your
> requirements, think about faster disk writes. This is the slowest part
> of the whole system and you will notice it can help quickly without
> changing code.

Merges happen all the time in the background, and usually they are
pretty light (3 small segments get merged into a bigger, but still small
segment).

Unfortunately, when you have 3 big segments getting merged into eg a 5GB
segment, that can use a lot of I/O, which impacts the performance of the
system.

If you can afford SSDs, definitely go for it. Otherwise, you will want
to throttle the speed of merges -- currently they happen as fast as your
system allows.

Take a look at "store level throttling" on
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Also, only leaving 25% of your RAM for the filesystem caches is
insufficient. We recommend 50% of RAM for the ES_HEAP and 50% for the
system.  The file system caches help a lot with disk access speed

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

joergprante@gmail.com
In reply to this post by Daniel Myung
Did you check in the logs if the setting was successfully applied? I
only know this syntax (at index creation)

curl -XPUT 'localhost:9200/myindex' -d '
{
     "index.store.throttle.type": "merge",
     "index.store.throttle.max_bytes_per_sec": "5mb"
}
'

Jörg

Am 26.02.13 18:44, schrieb dmyung:
> So in my indexing _settings I set - i set a 5mb store level throttle,
> and yet things still were taxing. Is that the indicator that my disks
> were terrible? That it was set to 5mb and it still crushed my I/O?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

Clinton Gormley-2
On Tue, 2013-02-26 at 19:29 +0100, Jörg Prante wrote:
> Did you check in the logs if the setting was successfully applied? I
> only know this syntax (at index creation)
>
> curl -XPUT 'localhost:9200/myindex' -d '
> {
>      "index.store.throttle.type": "merge",
>      "index.store.throttle.max_bytes_per_sec": "5mb"
> }
> '

Also note that 5mb may still be too much - experiment with it

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

Daniel Myung
Yes, curling it shows the settings are present - will experiment with a more restrictive throttle - thanks!


On Tue, Feb 26, 2013 at 1:41 PM, Clinton Gormley <[hidden email]> wrote:
On Tue, 2013-02-26 at 19:29 +0100, Jörg Prante wrote:
> Did you check in the logs if the setting was successfully applied? I
> only know this syntax (at index creation)
>
> curl -XPUT 'localhost:9200/myindex' -d '
> {
>      "index.store.throttle.type": "merge",
>      "index.store.throttle.max_bytes_per_sec": "5mb"
> }
> '

Also note that 5mb may still be too much - experiment with it

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

Daniel Myung
So, to revisit this thread (and having learned a lot on other issues on the group) - I thought I'd give a report of our next steps, as our admittedly unusual ES use case may hopefully be of  some intellectual interest.

Revisiting the problem:
Our CASE index, on a 2x8GB cluster, trying to index 500,000 documents, about halfway through bulk indexing we'd start getting exceedingly high indexing times due to what ultimately ended up being merge operations. What was once 200-300 index times turned into 10-15 minute waits for 500 doc bursts. Indexing operations were multi-day ordeals tuning our timeout and retry thresholds as well as tuning cluster fault detection parameters to not have one node fall off due to being so overwhelmed. Even _status and other diagnostic api calls on our clusters were taking 10-30 seconds to get returns.

The actual index and our data
  • The CASE index is on a 5 shard setup.
  • 525,000 documents
  • 26.5 GB, default_analyzer
  • We dump our json data (from couchdb) unmodified into ES. 
  • When querying we actually use _source to pull the data back out and serve it up (so we'd like to not fiddle/modify the doc on the way to ES)

In this version of the index, we had a total of 788 unique types inside that index.
In my wisdom (at the time), I thought I needed to separate out by customer and project the types so as to keep pure all properties in the off chance that one might make one a datetime field and another customer would make it a string. Ultimately, I didn't even make that distinction, and set { dynamic: true, date_detection: false}

Anyway, some numbers:
  • 788 types
  • 41 average properties per type - 24 of which are common across all docs (5 of them being datetimes). 
  • std dev of type's properties was 28. 
  • Total unique property names: 5500
  • (no numbers on what the distribution of docs with tons of properties vs. docs with few properties are - can try to get them if people are curious)
  • For our needs we changed the tokenizer to whitespace - it reduced the total index size in half to 13 GB, but still was crushing our system when trying to index

One of the indexed properties was an array of transactions that I mapped 3 datetime properties.
All the rest of the fields were string types.

A call to the index's _mapping returned us a 20MB mapping file :)

We kept on crushing our initial single 8GB node when trying to index, and yet adding a 2nd node to the cluster only prolonged the inevitable crush by say, 50-100,000 documents later.

The Solution: Scrap and start over

We ultimately decided to scrap our fully dynamic mapping on ALL these properties due to the obvious scale issue, but a re-examination of desired use of it. The 90% rule applied for us here in that 90% of uses for our index (in our current service deployment) would only require queries off the globally shared 24 properties mentioned above).  Also, since we were being rather blunt with our mapping and setting ALL properties to strings, it made sense to make all the properties fully mapped out on a per-customer basis based on need. That would eventually let us craft a better mapping (or until we can automate getting all the properties correctly typed ahead of time)

So, we made 1 type for the CASE index, the CASE_TYPE, for those 24 shared properties, and { dynamic: False }, index size, 1.5GB, and it went lightning fast, the limiting agent was our speed in getting the docs out of the database. I no longer have envy of the seemingly otherwordly indexing speeds I read about on this group!

We made a 2nd index called FULL_CASES for customers that have fully defined property mappings set for us and we do set { dynamic: true } along with pre-injecting the mapping with all the types ahead of indexing the first doc. As this customer count is still small, the index for FULL_CASES is small and manageable .... for now.

The Future
The question does come up though, if we do ramp up and need to completely map out all properties for more customers, how can we prepare for this? If we reach the numbers above again, we'll crush our system.  To reiterate, we think we need to keep unique customer CASE_TYPES in unique types.  Customer X might say last_encounter is a string "at the bowling alley", but Customer Y might say last_encounter is a datetime. We were told that this was slightly outside the norm of ES usage - if there are suggested workarounds or things for us to tune differently we certainly are all ears.

Thanks,

Dan







On Tue, Feb 26, 2013 at 1:46 PM, Daniel Myung <[hidden email]> wrote:
Yes, curling it shows the settings are present - will experiment with a more restrictive throttle - thanks!


On Tue, Feb 26, 2013 at 1:41 PM, Clinton Gormley <[hidden email]> wrote:
On Tue, 2013-02-26 at 19:29 +0100, Jörg Prante wrote:
> Did you check in the logs if the setting was successfully applied? I
> only know this syntax (at index creation)
>
> curl -XPUT 'localhost:9200/myindex' -d '
> {
>      "index.store.throttle.type": "merge",
>      "index.store.throttle.max_bytes_per_sec": "5mb"
> }
> '

Also note that 5mb may still be too much - experiment with it

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: understanding scaling & clustering decision making

Clinton Gormley-2

>
>
> A call to the index's _mapping returned us a 20MB mapping file :)

WOW!

> So, we made 1 type for the CASE index, the CASE_TYPE, for those 24
> shared properties, and { dynamic: False }, index size, 1.5GB, and it
> went lightning fast, the limiting agent was our speed in getting the
> docs out of the database. I no longer have envy of the seemingly
> otherwordly indexing speeds I read about on this group!

heh :)

> The Future
>
> The question does come up though, if we do ramp up and need to
> completely map out all properties for more customers, how can we
> prepare for this? If we reach the numbers above again, we'll crush our
> system.  To reiterate, we think we need to keep unique customer
> CASE_TYPES in unique types.  Customer X might say last_encounter is a
> string "at the bowling alley", but Customer Y might say last_encounter
> is a datetime. We were told that this was slightly outside the norm of
> ES usage - if there are suggested workarounds or things for us to tune
> differently we certainly are all ears.

Yeah, your two choices are:
1) different indexes
2) name the fields differently

clint


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Loading...