How to update/delete indexed documents from ES index using mysql jdbc river

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to update/delete indexed documents from ES index using mysql jdbc river

dark_shadow
Hi,

I'm using jdbc mysql river plugin https://github.com/jprante/elasticsearch-river-jdbc for creating ES index. I have been able to index my documents successfully but I'm facing issues in updating/deleting indexed documents. My jdbc river is used with a sql query that uses multiple joins on tables and return results. These results are then indexed in ES. My problem is if I update some tables they will affect the results of that join query which should be reflected in ES index but ES index is not updating/deleting sql results from that join query. I found few threads where people are facing similar issue.

http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more thread from author itself where he states that deletions are no longer supported. He tells two methods of tackling the issue. One is re indexing itself and second is using some sql queries to update/delete indexed documents.
https://github.com/jprante/elasticsearch-river-jdbc/issues/202

Can anyone please tell me how can I tackle the issue. How can I update/delete already indexed documents. Can anyone please elaborate on the second method of updating/deleting indexed documents.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to update/delete indexed documents from ES index using mysql jdbc river

joergprante@gmail.com
Can you give a minimal example of a query with the rows, and what rows are deleted then, so we can work through the issue?

The fundamental problem is that deleted rows in SQL are no longer available for creating deletion requests and so they can not be tracked over time - once they are gone, they are gone. The problem is known as "stale data". This can be solved either at a bigger scope (by using time windowed indexes where older indexes can be dropped) or by an extra DB mechanism to provide the IDs of the deleted docs after they are deleted (maybe by trigger), so they can be selected by JDBC plugin with a "select _optype, _id" construction. Note, at a certain size, deleting single docs in ES is not efficient.

To sync data between DB and ES, JDBC plugin is probably not smart enough (it is impossible to implement app-specific logic in JDBC plugin). So you should also consider to write a middleware app with specific logic that controls the deletions in the DB and after that deletes docs in ES.

Jörg




On Thu, Aug 7, 2014 at 10:44 AM, coder <[hidden email]> wrote:
Hi,

I'm using jdbc mysql river plugin https://github.com/jprante/elasticsearch-river-jdbc for creating ES index. I have been able to index my documents successfully but I'm facing issues in updating/deleting indexed documents. My jdbc river is used with a sql query that uses multiple joins on tables and return results. These results are then indexed in ES. My problem is if I update some tables they will affect the results of that join query which should be reflected in ES index but ES index is not updating/deleting sql results from that join query. I found few threads where people are facing similar issue.

http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more thread from author itself where he states that deletions are no longer supported. He tells two methods of tackling the issue. One is re indexing itself and second is using some sql queries to update/delete indexed documents.
https://github.com/jprante/elasticsearch-river-jdbc/issues/202

Can anyone please tell me how can I tackle the issue. How can I update/delete already indexed documents. Can anyone please elaborate on the second method of updating/deleting indexed documents.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHPW5DaPGoewQJ6Qp0bBWubuFfaDUZ1x1eqOLrnZo9LrQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to update/delete indexed documents from ES index using mysql jdbc river

dark_shadow
Here goes my use case:

Table t1 --- >
id         a               b
123 somestring somestring

Table t2  --->
id           c                              d                          e
123 someIntegerCount somebooleanValue someString
select * from t1,t2 where t1.id=t2.id and t2.c > 0 and t2.d = 1;

which gives some rows as:

id                 a              b                             c                 d                              e 
123      someString   someString    someIntegerCount  somebooleanValue   someString
Now, In my use case the values of c and d fields in table t2 keeps changing frequently. So, I index only those rows for which c field > 0 (as count keeps changing) and d field = 1 (which means either enabled or disabled).

Now, First time indexing is done without any issues. Problem comes when I update these two fields and want ES to reindex the documents. Since there might be some documents for which earlier c field was 0 but now it is non-zero and similarly d field was 0 earlier but changed to 1. Now, I want ES ti reflect those changes.(I guess that what mongo river does and I expect mysql to work, automatic sync). Also, there will be few results which were coming earlier but not now. How to delete those docs from index ?

How can I accomplish this ?

I have tried to explain my problem in simplest manner by keeping things simple. Please ask questions if anything is not clear.

Thanks 

On Thursday, 7 August 2014 14:25:07 UTC+5:30, Jörg Prante wrote:
Can you give a minimal example of a query with the rows, and what rows are deleted then, so we can work through the issue?

The fundamental problem is that deleted rows in SQL are no longer available for creating deletion requests and so they can not be tracked over time - once they are gone, they are gone. The problem is known as "stale data". This can be solved either at a bigger scope (by using time windowed indexes where older indexes can be dropped) or by an extra DB mechanism to provide the IDs of the deleted docs after they are deleted (maybe by trigger), so they can be selected by JDBC plugin with a "select _optype, _id" construction. Note, at a certain size, deleting single docs in ES is not efficient.

To sync data between DB and ES, JDBC plugin is probably not smart enough (it is impossible to implement app-specific logic in JDBC plugin). So you should also consider to write a middleware app with specific logic that controls the deletions in the DB and after that deletes docs in ES.

Jörg




On Thu, Aug 7, 2014 at 10:44 AM, coder <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="GdyeYI8dyQcJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">mukul...@...> wrote:
Hi,

I'm using jdbc mysql river plugin <a href="https://github.com/jprante/elasticsearch-river-jdbc" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fjprante%2Felasticsearch-river-jdbc\46sa\75D\46sntz\0751\46usg\75AFQjCNGQ5duwjRs6kqyFQ0HcPN8mcwk2LA';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fjprante%2Felasticsearch-river-jdbc\46sa\75D\46sntz\0751\46usg\75AFQjCNGQ5duwjRs6kqyFQ0HcPN8mcwk2LA';return true;">https://github.com/jprante/elasticsearch-river-jdbc for creating ES index. I have been able to index my documents successfully but I'm facing issues in updating/deleting indexed documents. My jdbc river is used with a sql query that uses multiple joins on tables and return results. These results are then indexed in ES. My problem is if I update some tables they will affect the results of that join query which should be reflected in ES index but ES index is not updating/deleting sql results from that join query. I found few threads where people are facing similar issue.

<a href="http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fstackoverflow.com%2Fquestions%2F21260086%2Felasticsearch-river-jdbc-mysql-not-deleting-records\46sa\75D\46sntz\0751\46usg\75AFQjCNFfh9K6AJHvJ5X10bewxAOloL5QSg';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fstackoverflow.com%2Fquestions%2F21260086%2Felasticsearch-river-jdbc-mysql-not-deleting-records\46sa\75D\46sntz\0751\46usg\75AFQjCNFfh9K6AJHvJ5X10bewxAOloL5QSg';return true;">http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more thread from author itself where he states that deletions are no longer supported. He tells two methods of tackling the issue. One is re indexing itself and second is using some sql queries to update/delete indexed documents.
<a href="https://github.com/jprante/elasticsearch-river-jdbc/issues/202" target="_blank" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fjprante%2Felasticsearch-river-jdbc%2Fissues%2F202\46sa\75D\46sntz\0751\46usg\75AFQjCNFrag0JplVvqoL07yFDI7LVtxdQqw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fjprante%2Felasticsearch-river-jdbc%2Fissues%2F202\46sa\75D\46sntz\0751\46usg\75AFQjCNFrag0JplVvqoL07yFDI7LVtxdQqw';return true;">https://github.com/jprante/elasticsearch-river-jdbc/issues/202

Can anyone please tell me how can I tackle the issue. How can I update/delete already indexed documents. Can anyone please elaborate on the second method of updating/deleting indexed documents.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42039e84-86e3-4dff-baac-662174a5c1dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: How to update/delete indexed documents from ES index using mysql jdbc river

joergprante@gmail.com
Only a quick thought, it seems you have a large table in RDBMS, and there are selectors ('where' condition) that control the dissemanition, what rows of the table have to be selected for further processing.

A first thought is to pull the whole database into ES and build filtered queries for the 'where' condition.

If the database table is steadily growing, it could be more feasible to fetch them by timestamp, e.g. once a day or per hour.

Jörg




On Thu, Aug 7, 2014 at 6:19 PM, coder <[hidden email]> wrote:
Here goes my use case:

Table t1 --- >
id         a               b
123 somestring somestring

Table t2  --->
id           c                              d                          e
123 someIntegerCount somebooleanValue someString
select * from t1,t2 where t1.id=t2.id and t2.c > 0 and t2.d = 1;

which gives some rows as:

id                 a              b                             c                 d                              e 
123      someString   someString    someIntegerCount  somebooleanValue   someString
Now, In my use case the values of c and d fields in table t2 keeps changing frequently. So, I index only those rows for which c field > 0 (as count keeps changing) and d field = 1 (which means either enabled or disabled).

Now, First time indexing is done without any issues. Problem comes when I update these two fields and want ES to reindex the documents. Since there might be some documents for which earlier c field was 0 but now it is non-zero and similarly d field was 0 earlier but changed to 1. Now, I want ES ti reflect those changes.(I guess that what mongo river does and I expect mysql to work, automatic sync). Also, there will be few results which were coming earlier but not now. How to delete those docs from index ?

How can I accomplish this ?

I have tried to explain my problem in simplest manner by keeping things simple. Please ask questions if anything is not clear.

Thanks 

On Thursday, 7 August 2014 14:25:07 UTC+5:30, Jörg Prante wrote:
Can you give a minimal example of a query with the rows, and what rows are deleted then, so we can work through the issue?

The fundamental problem is that deleted rows in SQL are no longer available for creating deletion requests and so they can not be tracked over time - once they are gone, they are gone. The problem is known as "stale data". This can be solved either at a bigger scope (by using time windowed indexes where older indexes can be dropped) or by an extra DB mechanism to provide the IDs of the deleted docs after they are deleted (maybe by trigger), so they can be selected by JDBC plugin with a "select _optype, _id" construction. Note, at a certain size, deleting single docs in ES is not efficient.

To sync data between DB and ES, JDBC plugin is probably not smart enough (it is impossible to implement app-specific logic in JDBC plugin). So you should also consider to write a middleware app with specific logic that controls the deletions in the DB and after that deletes docs in ES.

Jörg




On Thu, Aug 7, 2014 at 10:44 AM, coder <[hidden email]> wrote:
Hi,

I'm using jdbc mysql river plugin https://github.com/jprante/elasticsearch-river-jdbc for creating ES index. I have been able to index my documents successfully but I'm facing issues in updating/deleting indexed documents. My jdbc river is used with a sql query that uses multiple joins on tables and return results. These results are then indexed in ES. My problem is if I update some tables they will affect the results of that join query which should be reflected in ES index but ES index is not updating/deleting sql results from that join query. I found few threads where people are facing similar issue.

http://stackoverflow.com/questions/21260086/elasticsearch-river-jdbc-mysql-not-deleting-records

I'm using ES 1.1.0 with jdbc river version 1.1.0.2. There is one more thread from author itself where he states that deletions are no longer supported. He tells two methods of tackling the issue. One is re indexing itself and second is using some sql queries to update/delete indexed documents.
https://github.com/jprante/elasticsearch-river-jdbc/issues/202

Can anyone please tell me how can I tackle the issue. How can I update/delete already indexed documents. Can anyone please elaborate on the second method of updating/deleting indexed documents.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f1179d7d-4232-4444-b0ee-e7ac383a4bfc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42039e84-86e3-4dff-baac-662174a5c1dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGF0%2BzKfmtxfM_yaVpG8%2BnhHXmUztNQOuOOJehUt9K6-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.