How to make a better performance of search ?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to make a better performance of search ?

Timo
Now I got a situation :

1.we use ES cluster to store our logs from hundreds of applications. We just keep the logs for 7 days. The data almost reach 1TB per day.And we can expect that will reach 5TB per day in future.

2. we make all the data from one day into one index (Datas from different days were in different indexs). And each index was seperated into 5 shards. Only 1 replication.And we got 10 nodes in total.

3. we didnt take some specific configuration , almost default.

4.User only search datas from one application but maybe from 2 or 3 days (search data from several indexs  in order to drop old datas from 7 days ealier and make the data flow from clients balanced).

How it goes now ?
Seems like it is just ok but almost reach the limit of current cluster. We can not add any applications now. Or it will crash or become too slow.

I have looked into the guide of ES, I have one idear is routing , anybody can provide some suggestion? I would really appreaciate for that.

1.When we indexed data , we use "routing" with app_name (or app_name and  date)?
2.When we search data , use same routing
3.because we only store tomcat logs , we can specify the language as english

My questions are
1. I dont know how to handle routing in this situation. Is it enough using "app_name" as routing param , or have to add date?
2. specify the  language is helpful? I mean "langId":"english" when you indexed data.
3. Is there any way helpful ? 5T per day is really a challenge
Reply | Threaded
Open this post in threaded view
|

Re: How to make a better performance of search ?

dadoonet
Hi,
 
 
> 4.User only search datas from one application but maybe from 2 or 3 days
> (search data from several indexs in order to drop old datas from 7 days
> ealier and make the data flow from clients balanced).
That's a very important point to consider.
 
> 1.When we indexed data , we use "routing" with app_name (or app_name and
> date)?
> 2.When we search data , use same routing
> 3.because we only store tomcat logs , we can specify the language as english
>
> My questions are
> 1. I dont know how to handle routing in this situation. Is it enough using
> "app_name" as routing param , or have to add date?
Yes. Use routing at index and search time using app_name. Your user will hit a single shard when doing requests so it will be faster.
 
> 2. specify the language is helpful? I mean "langId":"english" when you
> indexed data.
By default, an english analyzer is applied: the standard analyzer. So if you want to remove stop words, you don't have to define something here.
If you want to keep all terms, prefer a simple analyzer. It will tokenize and lowercase everything without filtering words. It really depends on your use case.
So define a mapping first, before creating your everyday  index. For your use case a template should be helpful.
 
 
> 3. Is there any way helpful ? 5T per day is really a challenge
My english is too poor. I don't understand the question. :-/
 
Does it help?
 
 
--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: How to make a better performance of search ?

Timo
uhuh, it is really helpful. Thanks!
I am also speaking poor english.

The last question is , i want to know if I can do something else to improve the performance . Because our cluster will handle a big amount of data, about 5TB in one day in the future.
Reply | Threaded
Open this post in threaded view
|

Re: How to make a better performance of search ?

Timo
And because i will try to use routing , i plan to make more shards , change it from 5 shards to 10 , maybe 20 , and the routing will take user request to the single shard , so the more shards wont be the performance problem , even we split datas into more but smaller shards , i think it will become faster