Static MySQL Data

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Static MySQL Data

banderon1
I have two tables (each with millions of records) in a mysql database
that I want to be able to index and search. I love the features of
elasticsearch, but I'm not sure of how to approach this. The tables
are pretty static, and change only once per day (based on a cron job I
run). Can I get some feedback about how I would go about implementing
elasticsearch with this kind of setup?
Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

Ævar Arnfjörð Bjarmason
On Thu, Apr 19, 2012 at 01:40, banderon1 <[hidden email]> wrote:
> I have two tables (each with millions of records) in a mysql database
> that I want to be able to index and search. I love the features of
> elasticsearch, but I'm not sure of how to approach this. The tables
> are pretty static, and change only once per day (based on a cron job I
> run). Can I get some feedback about how I would go about implementing
> elasticsearch with this kind of setup?

I have a similar setup and I just have a cronjob that slurps all that
data up daily and spews it into ElasticSearch. You can either built a
new index daily (simpler) or update the data in-place.
Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

Berkay Mollamustafaoglu-2
You may want to take a look at Scrutineer for inspiration  https://github.com/Aconex/scrutineer 

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype


On Thu, Apr 19, 2012 at 2:02 AM, Ævar Arnfjörð Bjarmason <[hidden email]> wrote:
On Thu, Apr 19, 2012 at 01:40, banderon1 <[hidden email]> wrote:
> I have two tables (each with millions of records) in a mysql database
> that I want to be able to index and search. I love the features of
> elasticsearch, but I'm not sure of how to approach this. The tables
> are pretty static, and change only once per day (based on a cron job I
> run). Can I get some feedback about how I would go about implementing
> elasticsearch with this kind of setup?

I have a similar setup and I just have a cronjob that slurps all that
data up daily and spews it into ElasticSearch. You can either built a
new index daily (simpler) or update the data in-place.

Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

banderon1
In reply to this post by Ævar Arnfjörð Bjarmason
Could you point to some documentation about how to "slurp all that data up and spew it into ElasticSearch"? I'm a n00b, and have yet to find clear instructions in the ES documentation. I'm planning on using a PHP client (Elastica), but if that's not needed, let me know how you are doing it.

Thanks!

On Wednesday, April 18, 2012 11:02:38 PM UTC-7, Ævar Arnfjörð Bjarmason wrote:
On Thu, Apr 19, 2012 at 01:40, banderon1 <[hidden email]> wrote:
> I have two tables (each with millions of records) in a mysql database
> that I want to be able to index and search. I love the features of
> elasticsearch, but I'm not sure of how to approach this. The tables
> are pretty static, and change only once per day (based on a cron job I
> run). Can I get some feedback about how I would go about implementing
> elasticsearch with this kind of setup?

I have a similar setup and I just have a cronjob that slurps all that
data up daily and spews it into ElasticSearch. You can either built a
new index daily (simpler) or update the data in-place.

Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

kimchy
Administrator
Read the data from mysql and use bulk API to index it into elasticsearch.

On Fri, Apr 20, 2012 at 7:14 PM, banderon1 <[hidden email]> wrote:
Could you point to some documentation about how to "slurp all that data up and spew it into ElasticSearch"? I'm a n00b, and have yet to find clear instructions in the ES documentation. I'm planning on using a PHP client (Elastica), but if that's not needed, let me know how you are doing it.

Thanks!


On Wednesday, April 18, 2012 11:02:38 PM UTC-7, Ævar Arnfjörð Bjarmason wrote:
On Thu, Apr 19, 2012 at 01:40, banderon1 <[hidden email]> wrote:
> I have two tables (each with millions of records) in a mysql database
> that I want to be able to index and search. I love the features of
> elasticsearch, but I'm not sure of how to approach this. The tables
> are pretty static, and change only once per day (based on a cron job I
> run). Can I get some feedback about how I would go about implementing
> elasticsearch with this kind of setup?

I have a similar setup and I just have a cronjob that slurps all that
data up daily and spews it into ElasticSearch. You can either built a
new index daily (simpler) or update the data in-place.


Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

banderon1
Thank you, I am finally starting to understand how this works! I am able to create an index and add documents; however, when loading the records in from MySQL, PHP is crashing due to excessive memory usage. I'm able to dump the data into a json text file (using SELECT CONCAT()). Can I index this "static" data to avoid the extra work on the server?

On Saturday, April 21, 2012 8:09:25 AM UTC-7, kimchy wrote:
Read the data from mysql and use bulk API to index it into elasticsearch.

On Fri, Apr 20, 2012 at 7:14 PM, banderon1 <[hidden email]> wrote:
Could you point to some documentation about how to "slurp all that data up and spew it into ElasticSearch"? I'm a n00b, and have yet to find clear instructions in the ES documentation. I'm planning on using a PHP client (Elastica), but if that's not needed, let me know how you are doing it.

Thanks!


On Wednesday, April 18, 2012 11:02:38 PM UTC-7, Ævar Arnfjörð Bjarmason wrote:
On Thu, Apr 19, 2012 at 01:40, banderon1 <[hidden email]> wrote:
> I have two tables (each with millions of records) in a mysql database
> that I want to be able to index and search. I love the features of
> elasticsearch, but I'm not sure of how to approach this. The tables
> are pretty static, and change only once per day (based on a cron job I
> run). Can I get some feedback about how I would go about implementing
> elasticsearch with this kind of setup?

I have a similar setup and I just have a cronjob that slurps all that
data up daily and spews it into ElasticSearch. You can either built a
new index daily (simpler) or update the data in-place.


Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

Michael Sick
Would help to know some details on how you're building the request. Which API are you using (Bulk/Index)? How many requests are you building up before submitting? ...
--Mike

On Mon, Apr 23, 2012 at 2:21 PM, banderon1 <[hidden email]> wrote:
Thank you, I am finally starting to understand how this works! I am able to create an index and add documents; however, when loading the records in from MySQL, PHP is crashing due to excessive memory usage. I'm able to dump the data into a json text file (using SELECT CONCAT()). Can I index this "static" data to avoid the extra work on the server?

On Saturday, April 21, 2012 8:09:25 AM UTC-7, kimchy wrote:
Read the data from mysql and use bulk API to index it into elasticsearch.

On Fri, Apr 20, 2012 at 7:14 PM, banderon1 <[hidden email]> wrote:
Could you point to some documentation about how to "slurp all that data up and spew it into ElasticSearch"? I'm a n00b, and have yet to find clear instructions in the ES documentation. I'm planning on using a PHP client (Elastica), but if that's not needed, let me know how you are doing it.

Thanks!


On Wednesday, April 18, 2012 11:02:38 PM UTC-7, Ævar Arnfjörð Bjarmason wrote:
On Thu, Apr 19, 2012 at 01:40, banderon1 <[hidden email]> wrote:
> I have two tables (each with millions of records) in a mysql database
> that I want to be able to index and search. I love the features of
> elasticsearch, but I'm not sure of how to approach this. The tables
> are pretty static, and change only once per day (based on a cron job I
> run). Can I get some feedback about how I would go about implementing
> elasticsearch with this kind of setup?

I have a similar setup and I just have a cronjob that slurps all that
data up daily and spews it into ElasticSearch. You can either built a
new index daily (simpler) or update the data in-place.



Reply | Threaded
Open this post in threaded view
|

Fwd: Static MySQL Data

Ævar Arnfjörð Bjarmason
On Mon, Apr 23, 2012 at 21:15, Michael Sick
<[hidden email]> wrote:
> Would help to know some details on how you're building the request. Which
> API are you using (Bulk/Index)? How many requests are you building up before
> submitting? ...

In my case the whole thing is:

 * SELECT some stuff FROM table

 * Read that from the database with an iterator and buffer up however
   many things make sense, in my case I buffer up 1-10k documents
   depending on what I'm inserting, but it doesn't really matter that
   much. The bulk API is mainly "bulk" in the sense that you can send
   a lot of stuff over at once, it doesn't allow ElasticSearch to do
   any special optimizations when inserting the data.

 * If anything fails I try again 10 times sleeping 1..10 seconds in
   between the retry, respectively. Due to optimistic concurrency
   control you don't even have to parse the response to see which
   things were successfully indexed and only retry the ones that
   weren't, but you can do it if you're so inclined.

All in all it's like any other program you'd write to move data
between systems. Does that answer your question.
Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

banderon1
In reply to this post by Michael Sick
Right now I am using the Elastica PHP client. I am calling addDocuments, which uses the bulk api. I have over a million 'documents' that need to be indexed.

On Monday, April 23, 2012 12:15:03 PM UTC-7, Michael Sick wrote:
Would help to know some details on how you're building the request. Which API are you using (Bulk/Index)? How many requests are you building up before submitting? ...
--Mike

On Mon, Apr 23, 2012 at 2:21 PM, banderon1 <[hidden email]> wrote:
Thank you, I am finally starting to understand how this works! I am able to create an index and add documents; however, when loading the records in from MySQL, PHP is crashing due to excessive memory usage. I'm able to dump the data into a json text file (using SELECT CONCAT()). Can I index this "static" data to avoid the extra work on the server?

On Saturday, April 21, 2012 8:09:25 AM UTC-7, kimchy wrote:
Read the data from mysql and use bulk API to index it into elasticsearch.

On Fri, Apr 20, 2012 at 7:14 PM, banderon1 <[hidden email]> wrote:
Could you point to some documentation about how to "slurp all that data up and spew it into ElasticSearch"? I'm a n00b, and have yet to find clear instructions in the ES documentation. I'm planning on using a PHP client (Elastica), but if that's not needed, let me know how you are doing it.

Thanks!


On Wednesday, April 18, 2012 11:02:38 PM UTC-7, Ævar Arnfjörð Bjarmason wrote:
On Thu, Apr 19, 2012 at 01:40, banderon1 <[hidden email]> wrote:
> I have two tables (each with millions of records) in a mysql database
> that I want to be able to index and search. I love the features of
> elasticsearch, but I'm not sure of how to approach this. The tables
> are pretty static, and change only once per day (based on a cron job I
> run). Can I get some feedback about how I would go about implementing
> elasticsearch with this kind of setup?

I have a similar setup and I just have a cronjob that slurps all that
data up daily and spews it into ElasticSearch. You can either built a
new index daily (simpler) or update the data in-place.



Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

banderon1
In reply to this post by Ævar Arnfjörð Bjarmason
That works, but I'm just trying to optimize the process. Since I already have the data in a json object, I'm wondering if I can skip MySQL in this case.

On Monday, April 23, 2012 12:58:34 PM UTC-7, Ævar Arnfjörð Bjarmason wrote:
On Mon, Apr 23, 2012 at 21:15, Michael Sick
<[hidden email]> wrote:
> Would help to know some details on how you're building the request. Which
> API are you using (Bulk/Index)? How many requests are you building up before
> submitting? ...

In my case the whole thing is:

 * SELECT some stuff FROM table

 * Read that from the database with an iterator and buffer up however
   many things make sense, in my case I buffer up 1-10k documents
   depending on what I'm inserting, but it doesn't really matter that
   much. The bulk API is mainly "bulk" in the sense that you can send
   a lot of stuff over at once, it doesn't allow ElasticSearch to do
   any special optimizations when inserting the data.

 * If anything fails I try again 10 times sleeping 1..10 seconds in
   between the retry, respectively. Due to optimistic concurrency
   control you don't even have to parse the response to see which
   things were successfully indexed and only retry the ones that
   weren't, but you can do it if you're so inclined.

All in all it's like any other program you'd write to move data
between systems. Does that answer your question.

Reply | Threaded
Open this post in threaded view
|

Re: Static MySQL Data

Michael Sick
If MySQL is your system of record, it seems best to fetch the data from there since ES doesn't participate in distributed transactions AFAIK and you may have a consistency issue.

I always follow Ævar's advice above and build in control over the size of the batch submitted. It's not surprising that grabbing a batch of 1M anything can challenge memory as it's likely you have a few copies hanging around as your serializing/deserializing. Using the bulk API seems to help with routing and performance on the server side but I doubt it's not memory hungry on the client side as you're building a very large JSON document.

Then I'd try varying the batch size to see what's optimal (10k, 100K, ...) . If it's likely that the big load scenario is rare and that you're only indexing some small percent of your data on a daily basis, then I'd recommend that you only optimize it enough to know that you could do a full recovery this way if you had to.

--Mike

On Mon, Apr 23, 2012 at 4:24 PM, banderon1 <[hidden email]> wrote:
That works, but I'm just trying to optimize the process. Since I already have the data in a json object, I'm wondering if I can skip MySQL in this case.


On Monday, April 23, 2012 12:58:34 PM UTC-7, Ævar Arnfjörð Bjarmason wrote:
On Mon, Apr 23, 2012 at 21:15, Michael Sick
<[hidden email]> wrote:
> Would help to know some details on how you're building the request. Which
> API are you using (Bulk/Index)? How many requests are you building up before
> submitting? ...

In my case the whole thing is:

 * SELECT some stuff FROM table

 * Read that from the database with an iterator and buffer up however
   many things make sense, in my case I buffer up 1-10k documents
   depending on what I'm inserting, but it doesn't really matter that
   much. The bulk API is mainly "bulk" in the sense that you can send
   a lot of stuff over at once, it doesn't allow ElasticSearch to do
   any special optimizations when inserting the data.

 * If anything fails I try again 10 times sleeping 1..10 seconds in
   between the retry, respectively. Due to optimistic concurrency
   control you don't even have to parse the response to see which
   things were successfully indexed and only retry the ones that
   weren't, but you can do it if you're so inclined.

All in all it's like any other program you'd write to move data
between systems. Does that answer your question.