Re: Why isn't Elasticsearch using Sha1 for id?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Why isn't Elasticsearch using Sha1 for id?

Dan Pilone
Do you mean SHA1 of the document itself?  I'm pretty sure that would be a problem for us as we can have nested documents with identical values.  For example, you could have:

<some_doc>
  <author>
     <firstname>Dan</firstname>
     <lastname>Pilone</lastname>
  </author>
  ...
</some_doc>

If "author" is indexed as a nested document it will need an id which can't just be the SHA1 of the content we provided.  Now I suppose if it has the _parentId as part of the "content" then the SHA1 would be different, but I don't know when/how the parent id is associated with nested docs. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLC



On Tue, Jul 26, 2011 at 6:25 PM, ajsie <[hidden email]> wrote:
CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.

Reply | Threaded
Open this post in threaded view
|

Re: Why isn't Elasticsearch using Sha1 for id?

Paul Loy
You can supply your own ids...

On Tue, Jul 26, 2011 at 11:44 PM, Dan Pilone <[hidden email]> wrote:
Do you mean SHA1 of the document itself?  I'm pretty sure that would be a problem for us as we can have nested documents with identical values.  For example, you could have:

<some_doc>
  <author>
     <firstname>Dan</firstname>
     <lastname>Pilone</lastname>
  </author>
  ...
</some_doc>

If "author" is indexed as a nested document it will need an id which can't just be the SHA1 of the content we provided.  Now I suppose if it has the _parentId as part of the "content" then the SHA1 would be different, but I don't know when/how the parent id is associated with nested docs. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLC



On Tue, Jul 26, 2011 at 6:25 PM, ajsie <[hidden email]> wrote:
CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.




--
---------------------------------------------
Paul Loy
[hidden email]
http://uk.linkedin.com/in/paulloy
Reply | Threaded
Open this post in threaded view
|

Re: Why isn't Elasticsearch using Sha1 for id?

kimchy
Administrator
In reply to this post by Dan Pilone
The id generated is a type4 UUID (128bit) that is then base64 to reserve space.

On Wed, Jul 27, 2011 at 1:25 AM, ajsie <[hidden email]> wrote:
CouchDB is using a 40 long characters SHA1 id and they say that the
risk is very minimal.

I wonder if there is a risk that the id Elastic search auto generates
will collide with another one since it's only 22 characters long.