1.we use ES cluster to store our logs from hundreds of applications. We just keep the logs for 7 days. The data almost reach 1TB per day.And we can expect that will reach 5TB per day in future.
2. we make all the data from one day into one index (Datas from different days were in different indexs). And each index was seperated into 5 shards. Only 1 replication.And we got 10 nodes in total.
3. we didnt take some specific configuration , almost default.
4.User only search datas from one application but maybe from 2 or 3 days (search data from several indexs in order to drop old datas from 7 days ealier and make the data flow from clients balanced).
How it goes now ?
Seems like it is just ok but almost reach the limit of current cluster. We can not add any applications now. Or it will crash or become too slow.
I have looked into the guide of ES, I have one idear is routing , anybody can provide some suggestion? I would really appreaciate for that.
1.When we indexed data , we use "routing" with app_name (or app_name and date)?
2.When we search data , use same routing
3.because we only store tomcat logs , we can specify the language as english
My questions are
1. I dont know how to handle routing in this situation. Is it enough using "app_name" as routing param , or have to add date?
2. specify the language is helpful? I mean "langId":"english" when you indexed data.
3. Is there any way helpful ? 5T per day is really a challenge
And because i will try to use routing , i plan to make more shards , change it from 5 shards to 10 , maybe 20 , and the routing will take user request to the single shard , so the more shards wont be the performance problem , even we split datas into more but smaller shards , i think it will become faster