Quantcast

Updating the fields in index

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Updating the fields in index

arien
Hi all;
To provide the general idea about the problem i m facing,i m giving an
example of very simple form of index generation as per the tutorial.

lets say i have created an index with some fields as;

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "admin",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}'

I want to update the only field "user" so that it will have value as
"arien",for that i will have to do something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "arien",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}'

Now the problem with this is I want to retain valuefields for
"post_date" and "message" unchanged without providing them at the time
of updation against same index id.for ex. something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "arien",
}'

but while querying(in search api) i should be getting all the 3 fields
specified in the index unlike i get only "user" field after updation
as the overridden index.

I m dealing with very large chunk of data at the time of indexing
using attachment mapper and its really an overhead to update an index
using all the fields everytime for even a small attribute change.So
please provide some suggestions regarding this.

Thanking you;
arien
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Updating the fields in index

Nicolas Lalevée

Le 24 août 2011 à 09:09, arien a écrit :

> Hi all;
> To provide the general idea about the problem i m facing,i m giving an
> example of very simple form of index generation as per the tutorial.
>
> lets say i have created an index with some fields as;
>
> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
>    "user" : "admin",
>    "post_date" : "2009-11-15T14:12:12",
>    "message" : "trying out Elastic Search"
> }'
>
> I want to update the only field "user" so that it will have value as
> "arien",for that i will have to do something like this
>
> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
>    "user" : "arien",
>    "post_date" : "2009-11-15T14:12:12",
>    "message" : "trying out Elastic Search"
> }'
>
> Now the problem with this is I want to retain valuefields for
> "post_date" and "message" unchanged without providing them at the time
> of updation against same index id.for ex. something like this
>
> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
>    "user" : "arien",
> }'
>
> but while querying(in search api) i should be getting all the 3 fields
> specified in the index unlike i get only "user" field after updation
> as the overridden index.
>
> I m dealing with very large chunk of data at the time of indexing
> using attachment mapper and its really an overhead to update an index
> using all the fields everytime for even a small attribute change.So
> please provide some suggestions regarding this.

The underlying library used by Elasticsearch, Lucene, doesn't handle field update, and will probably not anytime soon. There were some discussion to have a work around in Elasticsearch but it is not yet there, and it seems to be really a work around.

If the case fits for you, try to separate the fields that needs that needs to be indexed from the ones which don't. The fields that needs to be indexed, let them be handled by Elasticsearch, and for the others, choose another tool to store them, another classical database which would handle an update of just a field of a document. For the indexed fields, you'll have no choice but to push the entire document for each update. For the others, just update what you need. And when you query, search in elasticsearch and reconcile the data with the external data store.

Nicolas

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Updating the fields in index

arien
Hi;
Thanks for your suggestion.I m looking forward to consider it as one of the prominent way to solve my issue.



2011/8/24 Nicolas Lalevée <[hidden email]>

Le 24 août 2011 à 09:09, arien a écrit :

> Hi all;
> To provide the general idea about the problem i m facing,i m giving an
> example of very simple form of index generation as per the tutorial.
>
> lets say i have created an index with some fields as;
>
> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
>    "user" : "admin",
>    "post_date" : "2009-11-15T14:12:12",
>    "message" : "trying out Elastic Search"
> }'
>
> I want to update the only field "user" so that it will have value as
> "arien",for that i will have to do something like this
>
> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
>    "user" : "arien",
>    "post_date" : "2009-11-15T14:12:12",
>    "message" : "trying out Elastic Search"
> }'
>
> Now the problem with this is I want to retain valuefields for
> "post_date" and "message" unchanged without providing them at the time
> of updation against same index id.for ex. something like this
>
> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
>    "user" : "arien",
> }'
>
> but while querying(in search api) i should be getting all the 3 fields
> specified in the index unlike i get only "user" field after updation
> as the overridden index.
>
> I m dealing with very large chunk of data at the time of indexing
> using attachment mapper and its really an overhead to update an index
> using all the fields everytime for even a small attribute change.So
> please provide some suggestions regarding this.

The underlying library used by Elasticsearch, Lucene, doesn't handle field update, and will probably not anytime soon. There were some discussion to have a work around in Elasticsearch but it is not yet there, and it seems to be really a work around.

If the case fits for you, try to separate the fields that needs that needs to be indexed from the ones which don't. The fields that needs to be indexed, let them be handled by Elasticsearch, and for the others, choose another tool to store them, another classical database which would handle an update of just a field of a document. For the indexed fields, you'll have no choice but to push the entire document for each update. For the others, just update what you need. And when you query, search in elasticsearch and reconcile the data with the external data store.

Nicolas


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Updating the fields in index

kimchy
Administrator
In reply to this post by arien
The simplest way to solve this is to get the document, update the relevant field, and then index the document again. You can use versioning to make sure no other update has "sneaked" in while you were doing the update.

On Wed, Aug 24, 2011 at 10:09 AM, arien <[hidden email]> wrote:
Hi all;
To provide the general idea about the problem i m facing,i m giving an
example of very simple form of index generation as per the tutorial.

lets say i have created an index with some fields as;

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
   "user" : "admin",
   "post_date" : "2009-11-15T14:12:12",
   "message" : "trying out Elastic Search"
}'

I want to update the only field "user" so that it will have value as
"arien",for that i will have to do something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
   "user" : "arien",
   "post_date" : "2009-11-15T14:12:12",
   "message" : "trying out Elastic Search"
}'

Now the problem with this is I want to retain valuefields for
"post_date" and "message" unchanged without providing them at the time
of updation against same index id.for ex. something like this

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
   "user" : "arien",
}'

but while querying(in search api) i should be getting all the 3 fields
specified in the index unlike i get only "user" field after updation
as the overridden index.

I m dealing with very large chunk of data at the time of indexing
using attachment mapper and its really an overhead to update an index
using all the fields everytime for even a small attribute change.So
please provide some suggestions regarding this.

Thanking you;
arien

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Aw: Re: Updating the fields in index

Frifri
Let's say I save content in two mappings of which I update (replace) only one normally. In both mappings, I would use the same id. If now, for example, both contents contain each the same string in at least one field, is it possible to  return only one hit as if I found the string in only one mapping? (MySQL equivalent of "group by id")?

Thx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Updating the fields in index

summersmile1984
This post has NOT been accepted by the mailing list yet.
In reply to this post by arien
The Autonomy IDOL search engine has a ReplaceFieldValue action could update the indexed item . In fact ,I am facing the same problem with you. Extracting the attachment file is a heavy job. Re index the item makes the work quite unhappy , especially you have a lot of items to reindex. Such as you want to update the view count of the document, the replay count. If you want to order search results by this numbers. You have to index them into the index but not store it in the database.

The underline way of implement Update of Autonomy IDOL, seems to is making a new item by  copy the indexed item and then replace the value from the parameter. Deleting the older item some time later.

May be the version feature of elasticsearch could make some  sense to this senario.
Just some suggestion, I am a new comer of ES.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Aw: Re: Updating the fields in index

Frifri
In reply to this post by Frifri
Could it be possible to use Parent/Child? From my point of view that seems to be feasible, but a little slow as there are always two queries to be made. I'd use one mapping for the attachment and another for the comments.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Aw: Re: Updating the fields in index

Frifri
Yeah, looks like it seems to be ok. I overlooked a post of kimchy.
Loading...