Understanding my Index using HEAD plugin

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding my Index using HEAD plugin

IronMike
I am trying to understand my index (attached in screenshot) and how can I improve size and performance.
The goal is to index 5 million docs. So, I started small by indexing 421,000 docs as shown in the image.
- I am using two nodes (1 & 2) ; node 1 master 

Q1) My index size is 3.99 GB with 421,627 docs so far, so I am guessing it will be over 40 GB with 5 million docs? Does the size sound too big? I do have store = YES for PDF docs.
Q2) What is the (8 GB) from the image, is this the size on the 2 nodes? Also, what is (526,428) ?
Q3) Should I do more nodes, more/less shards?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/073c71bf-7499-4abc-8da6-5a381834c1bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Screen Shot 2014-06-27 at 10.53.58 AM.png (16K) Download Attachment
sri
Reply | Threaded
Open this post in threaded view
|

Re: Understanding my Index using HEAD plugin

sri
hi,

-for approximating the size, try doing so more test and you should be able to get an idea, also the size would depend very much on the type of data you are trying to index
-elastic HQ(www.elastichq.org) will be able to provide you more incite on the details of the cluster, size per index can be seen under 'node diagnostics' tab.  

Thanks and Regards
Sri

On Friday, June 27, 2014 11:12:32 AM UTC-4, IronMan2014 wrote:
I am trying to understand my index (attached in screenshot) and how can I improve size and performance.
The goal is to index 5 million docs. So, I started small by indexing 421,000 docs as shown in the image.
- I am using two nodes (1 & 2) ; node 1 master 

Q1) My index size is 3.99 GB with 421,627 docs so far, so I am guessing it will be over 40 GB with 5 million docs? Does the size sound too big? I do have store = YES for PDF docs.
Q2) What is the (8 GB) from the image, is this the size on the 2 nodes? Also, what is (526,428) ?
Q3) Should I do more nodes, more/less shards?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e920019-a60c-406c-b31f-4a59b2b9f2d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Understanding my Index using HEAD plugin

IronMike
Great. One of the stats is "deleted docs" or merge rate, this shows 18% in my example, it says if this number is high, it means slow I/O. 
I am not really sure if 19% is high, how can I control this number?

On Friday, June 27, 2014 11:34:32 AM UTC-4, sri wrote:
hi,

-for approximating the size, try doing so more test and you should be able to get an idea, also the size would depend very much on the type of data you are trying to index
-elastic HQ(<a href="http://www.elastichq.org" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastichq.org\46sa\75D\46sntz\0751\46usg\75AFQjCNHXyvCccMAMQ7pJ-MMhxjL44zlMoA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.elastichq.org\46sa\75D\46sntz\0751\46usg\75AFQjCNHXyvCccMAMQ7pJ-MMhxjL44zlMoA';return true;">www.elastichq.org) will be able to provide you more incite on the details of the cluster, size per index can be seen under 'node diagnostics' tab.  

Thanks and Regards
Sri

On Friday, June 27, 2014 11:12:32 AM UTC-4, IronMan2014 wrote:
I am trying to understand my index (attached in screenshot) and how can I improve size and performance.
The goal is to index 5 million docs. So, I started small by indexing 421,000 docs as shown in the image.
- I am using two nodes (1 & 2) ; node 1 master 

Q1) My index size is 3.99 GB with 421,627 docs so far, so I am guessing it will be over 40 GB with 5 million docs? Does the size sound too big? I do have store = YES for PDF docs.
Q2) What is the (8 GB) from the image, is this the size on the 2 nodes? Also, what is (526,428) ?
Q3) Should I do more nodes, more/less shards?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b65f515-5c46-4056-a518-afe422440809%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Understanding my Index using HEAD plugin

shadyabhi
In reply to this post by IronMike
See my answers inline.

On Fri, Jun 27, 2014 at 8:42 PM, IronMan2014 <[hidden email]> wrote:
> I am trying to understand my index (attached in screenshot) and how can I
> improve size and performance.
> The goal is to index 5 million docs. So, I started small by indexing 421,000
> docs as shown in the image.
> - I am using two nodes (1 & 2) ; node 1 master
>
> Q1) My index size is 3.99 GB with 421,627 docs so far, so I am guessing it
> will be over 40 GB with 5 million docs? Does the size sound too big? I do
> have store = YES for PDF docs.

Total size depends on the kind of docs you'll index. So, it depends!

> Q2) What is the (8 GB) from the image, is this the size on the 2 nodes?
> Also, what is (526,428) ?

Total size of primary shards equals 3.99GB in your case. So, 3.99GB
will be the total size in case you had zero replica. As you've 1
replica set, actual disk space used is 8GB.

Regarding number of documents, 421627 is the total number of docs
present in your index. 526428 is the max_docs your index has seen
before the merge removed the deleted docs.

> Q3) Should I do more nodes, more/less shards?

That really depends. I would suggest doing some tests to find out what
works best for you. Even having 2 shards will work in your case as you
have 2 nodes and each primary shard will go to different nodes. But
then, it'll limit your option of adding a node in case you need more
nodes. ( Of course, there are workarounds).

>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/073c71bf-7499-4abc-8da6-5a381834c1bc%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Cheers,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACXxYfzn7dFFU7CGG1%2BE_b4U8NSTNnx3me2Jmrs3U9FexpAsSw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.