Quantcast

More indices vs. more types

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

More indices vs. more types

Christian Aust
Hi all,

how is having multiple indices with just one document type different from having one index with multiple document types? When do I choose what?

My application sorts topics into namespaces. Most of the time I need to search all topics that belong to a namespace. My first implementation was to have an index for each topic, doing a multi-index search for a namespace. It works, but would the other way be different? How? I use faceting a lot.

Some numbers: A topic contains 10^2-10^6 documents. A namespace consists of 10-30 topics. Usually I have ~20 different namespaces per application instance.

Any help is appreciated. Kind regards,

Christian
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: More indices vs. more types

Eric Jain
On May 17, 10:18 am, Christian Aust <christian.a...@software-
consultant.net> wrote:
> how is having *multiple indices with just one document type* different from
> having *one index with multiple document types*? When do I choose what?

One advantage of multiple indexes is that you can close indexes that
are no longer needed. Searching multiple indexes should also be
faster--but only if the indexes are spread over enough machines. Have
you considered having one index per namespace?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: More indices vs. more types

Christian Aust
Am 18.05.2012 um 20:53 schrieb Eric Jain:

> On May 17, 10:18 am, Christian Aust <christian.a...@software-
> consultant.net> wrote:
>> how is having *multiple indices with just one document type* different from
>> having *one index with multiple document types*? When do I choose what?
>
> One advantage of multiple indexes is that you can close indexes that
> are no longer needed. Searching multiple indexes should also be
> faster--but only if the indexes are spread over enough machines. Have
> you considered having one index per namespace?

The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know. Does anybody else? Regards,

Christian


smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: More indices vs. more types

Eric Jain
On Fri, May 18, 2012 at 12:39 PM, Christian Aust
<[hidden email]> wrote:
> The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?

Right (assuming the indexes are all on one machine or have few documents).


> Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?

I don't know if there is a general answer to that question. If queries
are run on a single namespace (and there are enough documents in each
namespace), having one index (or perhaps shard) per namespace seems
like the way to go.

I don't think elasticsearch has issues handling a few hundred indexes
or index with a few dozen types, but there's no way around doing your
own performance testing...
Seb
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: More indices vs. more types

Seb
ES has bit trouble handling a big number of constantly open indexes,
don't do it!

I had something in production with a couple of hundred indexes and it
pretty much died every
2 weeks because of memory issues. Fortunately we were able to convert
those indexes to types and haven't had "big" issues with ES since.

Shay told me 6 months ago that it is much better to use few indexes
with types and aliases than actual indexes - I'm sure that advice
still applies.

You also should expect occasional shard failures resulting in
inconsistencies but that can be easily mitigated by just opening and
closing an index or
simply restarting the node. I've to do that every 1 or 2 months.

On May 19, 12:36 am, Eric Jain <[hidden email]> wrote:

> On Fri, May 18, 2012 at 12:39 PM, Christian Aust
>
> <[hidden email]> wrote:
> > The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?
>
> Right (assuming the indexes are all on one machine or have few documents).
>
> > Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?
>
> I don't know if there is a general answer to that question. If queries
> are run on a single namespace (and there are enough documents in each
> namespace), having one index (or perhaps shard) per namespace seems
> like the way to go.
>
> I don't think elasticsearch has issues handling a few hundred indexes
> or index with a few dozen types, but there's no way around doing your
> own performance testing...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: More indices vs. more types

Clinton Gormley-2

> You also should expect occasional shard failures resulting in
> inconsistencies but that can be easily mitigated by just opening and
> closing an index or
> simply restarting the node. I've to do that every 1 or 2 months.

I'm curious as to why you get occasional shard failures.  We've been
making heavy use of ES for over 2 years now, and I never need to touch
my boxes. They just keep running.

Are you using virtual servers or your own boxes? What environment, EC2
or hosted? How much memory, CPU etc?

clint


Seb
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: More indices vs. more types

Seb
Hi clint,

I'm running two dedicated dell servers with xeon L5520, 72GB.
The index that is failing is an very active one with many thousands of
writes per day.
What are you using to store your indices, I'm using the local
filesystem.

Here is the exception I'm getting, to me it seems like the file
pointer is wrong.

[2012-05-17 00:00:06,989][WARN ][index.shard.service      ] [Spot]
[classifieds][0] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[classifieds][0] Refresh failed
        at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
789)
        at
org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:
419)
        at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineRefresher$1.run(InternalIndexShard.java:706)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.FileNotFoundException: /home/user/els_main/data/
search/nodes/0/indices/classifieds/0/index/_pft0.prx (Operation not
permitted)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(Unknown Source)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.<init>(SimpleFSDirectory.java:70)
        at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:97)
        at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.<init>(NIOFSDirectory.java:92)
        at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
79)
        at org.elasticsearch.index.store.Store
$StoreDirectory.openInput(Store.java:452)
        at
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:
89)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
        at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:705)
        at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:680)
        at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:
201)
        at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3651)
        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3588)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
452)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:
401)
        at
org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:
428)
        at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:
448)
        at
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:
396)
        at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:
520)
        at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:
764)
        ... 5 more
[2012-05-17 00:00:07,325][WARN ][index.merge.scheduler    ] [Spot]
[classifieds][0] failed to merge
java.io.FileNotFoundException: /home/user/els_main/data/search/nodes/0/
indices/classifieds/0/index/_pft0.prx (Operation not permitted)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(Unknown Source)
        at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput
$Descriptor.<init>(SimpleFSDirectory.java:70)
        at org.apache.lucene.store.SimpleFSDirectory
$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:97)
        at org.apache.lucene.store.NIOFSDirectory
$NIOFSIndexInput.<init>(NIOFSDirectory.java:92)
        at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:
79)
        at org.elasticsearch.index.store.Store
$StoreDirectory.openInput(Store.java:452)
        at
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:
89)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
        at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:705)
        at org.apache.lucene.index.IndexWriter
$ReaderPool.get(IndexWriter.java:680)
        at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:
201)
        at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:
4086)
        at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:
4040)
        at
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:
354)
        at
org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvider
$CustomConcurrentMergeScheduler.merge(ConcurrentMergeSchedulerProvider.java:
104)
        at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:
2746)
        at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:
2740)
        at
org.elasticsearch.index.engine.robin.RobinEngine.maybeMerge(RobinEngine.java:
963)
        at org.elasticsearch.index.shard.service.InternalIndexShard
$EngineMerger$1.run(InternalIndexShard.java:750)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

On May 19, 1:51 pm, Clinton Gormley <[hidden email]> wrote:

> > You also should expect occasional shard failures resulting in
> > inconsistencies but that can be easily mitigated by just opening and
> > closing an index or
> > simply restarting the node. I've to do that every 1 or 2 months.
>
> I'm curious as to why you get occasional shard failures.  We've been
> making heavy use of ES for over 2 years now, and I never need to touch
> my boxes. They just keep running.
>
> Are you using virtual servers or your own boxes? What environment, EC2
> or hosted? How much memory, CPU etc?
>
> clint
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: More indices vs. more types

Radu Gheorghe
In reply to this post by Christian Aust
Hi Christian,

It also depends on the number of shards and replicas you have
configured per-index.

I have no idea what the absolute limit of total shards is (I suppose
it also depends on your hardware), but I think that having many
indices would slow your searches down.  Because each shard is a
separate Lucene index. But indexing operations should get faster.

So if you get a lot of documents to be indexed, having many indices
(thus shards) should help. If not, I would stick with multiple types.

Although, as Eric said, you need to test to be sure.

On 18 mai, 22:39, Christian Aust <christian.a...@software-
consultant.net> wrote:

> Am 18.05.2012 um 20:53 schrieb Eric Jain:
>
> > On May 17, 10:18 am, Christian Aust <christian.a...@software-
> > consultant.net> wrote:
> >> how is having *multiple indices with just one document type* different from
> >> having *one index with multiple document types*? When do I choose what?
>
> > One advantage of multiple indexes is that you can close indexes that
> > are no longer needed. Searching multiple indexes should also be
> > faster--but only if the indexes are spread over enough machines. Have
> > you considered having one index per namespace?
>
> The current implementation uses one index per topic, making searching a namespace a little more complex. I assume that searching 30 indices simultaneously will come with a performance penalty. Right?
>
> Lately I was wondering if I should move topics as document types into one index per namespace. I do not understand the consequences yet. Is it "expensive" to create hundreds of indices, or is it worse to have dozens of document types per index?
>
> I don't know. Does anybody else? Regards,
>
> Christian
>
>  smime.p7s
> 5KVizualizaţiDescărcaţi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: More indices vs. more types

Clinton Gormley-2
In reply to this post by Seb
Hiya

> I'm running two dedicated dell servers with xeon L5520, 72GB.
> The index that is failing is an very active one with many thousands of
> writes per day.
> What are you using to store your indices, I'm using the local
> filesystem.

> [2012-05-17 00:00:07,325][WARN ][index.merge.scheduler    ] [Spot]
> [classifieds][0] failed to merge
> java.io.FileNotFoundException: /home/user/els_main/data/search/nodes/0/
> indices/classifieds/0/index/_pft0.prx (Operation not permitted)


I'm wondering if you are running into an  open file limit, or running
out of inodes. Merging increases the number of filehandles considerably
(but temporarily).

Have a look in /var/log/messages|syslog - see if there is anything
there, and try raising your ulimit -n. Hopefully that'll help

clint


Loading...