May I use ES as DB to replace MongoDB?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

May I use ES as DB to replace MongoDB?

谢乐冰
Not a joke.

We have events log (userid, timestamp, action, entity ....) which records players' essential activities and is used for customer service. The volume is around 10-15 million rows a day and held for 3 months. The search condition could be complicated, such like userid + time range + activities; timerange + activities so on.

Currently 3 solutions are considered:

1. Use MongoDB cluster to hold the data. 
2. Use ES to index the log and for searching. Easy to setup and maintain.  
3. Use HBASE, but have to create multiple "indexes" 

any idea about that? Thanks!


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGJwNY4JYTXUhdBJ_d77iOHSf%2Bo%3Djs3p0h1%2Bx7URpqKBAe5e6w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: May I use ES as DB to replace MongoDB?

Eugene Strokin
It seems like you don't really need a search, but just filtering, so, you'd use a subset of features from ElasticSearch. But why would you think you cannot use ES as DB? What would be your concern?
Just, so you know, I use ES as the only storage for one of my project for second year already, for Big Data/BigTraffic application. And if you do things right, you should be allright as well.

Eugene

On Monday, January 13, 2014 5:31:24 AM UTC-5, Xie Lebing wrote:
Not a joke.

We have events log (userid, timestamp, action, entity ....) which records players' essential activities and is used for customer service. The volume is around 10-15 million rows a day and held for 3 months. The search condition could be complicated, such like userid + time range + activities; timerange + activities so on.

Currently 3 solutions are considered:

1. Use MongoDB cluster to hold the data. 
2. Use ES to index the log and for searching. Easy to setup and maintain.  
3. Use HBASE, but have to create multiple "indexes" 

any idea about that? Thanks!


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/98c63853-9faf-4d0c-912a-1698fbf91399%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: May I use ES as DB to replace MongoDB?

amos.wood
For one our projects, we also use Elasticsearch as the sole database.  The only consideration to make is that while gets by id are real-time, all other searches are subject to the "refresh interval" setting of a particular index/table.  We overcame this problem by:

1. Set the refresh_interval at 25ms
2. After a write to our service, we paused for 25ms before returning a successful write to the client.
3. Put an automatic retry mechanism on particular calls.  This helped when the index servers had heavy traffic and the "refresh interval" actually took more than 25ms.  This scenario happened when a client wrote a record and immediately wanted to get it by a field other than its id.

On Monday, January 13, 2014 11:24:00 AM UTC-6, Eugene Strokin wrote:
It seems like you don't really need a search, but just filtering, so, you'd use a subset of features from ElasticSearch. But why would you think you cannot use ES as DB? What would be your concern?
Just, so you know, I use ES as the only storage for one of my project for second year already, for Big Data/BigTraffic application. And if you do things right, you should be allright as well.

Eugene

On Monday, January 13, 2014 5:31:24 AM UTC-5, Xie Lebing wrote:
Not a joke.

We have events log (userid, timestamp, action, entity ....) which records players' essential activities and is used for customer service. The volume is around 10-15 million rows a day and held for 3 months. The search condition could be complicated, such like userid + time range + activities; timerange + activities so on.

Currently 3 solutions are considered:

1. Use MongoDB cluster to hold the data. 
2. Use ES to index the log and for searching. Easy to setup and maintain.  
3. Use HBASE, but have to create multiple "indexes" 

any idea about that? Thanks!


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/18f3da3f-a722-4c67-8fdf-4e8c5c638a30%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: May I use ES as DB to replace MongoDB?

davrob
From my understanding, which admittedly  is limited, there is still potential to lose data with ElasticSearch.  

Even with the new Snapshot API running regularly, if all indexes get corrupted, there is no guarantee of 100% data backup and restore, because you would lose the data which was added between you last snapshot and any subsequent updates to the index.



On Tuesday, 14 January 2014 13:23:55 UTC, amos.wood wrote:
For one our projects, we also use Elasticsearch as the sole database.  The only consideration to make is that while gets by id are real-time, all other searches are subject to the "refresh interval" setting of a particular index/table.  We overcame this problem by:

1. Set the refresh_interval at 25ms
2. After a write to our service, we paused for 25ms before returning a successful write to the client.
3. Put an automatic retry mechanism on particular calls.  This helped when the index servers had heavy traffic and the "refresh interval" actually took more than 25ms.  This scenario happened when a client wrote a record and immediately wanted to get it by a field other than its id.

On Monday, January 13, 2014 11:24:00 AM UTC-6, Eugene Strokin wrote:
It seems like you don't really need a search, but just filtering, so, you'd use a subset of features from ElasticSearch. But why would you think you cannot use ES as DB? What would be your concern?
Just, so you know, I use ES as the only storage for one of my project for second year already, for Big Data/BigTraffic application. And if you do things right, you should be allright as well.

Eugene

On Monday, January 13, 2014 5:31:24 AM UTC-5, Xie Lebing wrote:
Not a joke.

We have events log (userid, timestamp, action, entity ....) which records players' essential activities and is used for customer service. The volume is around 10-15 million rows a day and held for 3 months. The search condition could be complicated, such like userid + time range + activities; timerange + activities so on.

Currently 3 solutions are considered:

1. Use MongoDB cluster to hold the data. 
2. Use ES to index the log and for searching. Easy to setup and maintain.  
3. Use HBASE, but have to create multiple "indexes" 

any idea about that? Thanks!


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc84bbab-252c-4ecf-ab17-3ef6cb10a621%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: May I use ES as DB to replace MongoDB?

Eugene Strokin
You are correct. But how this is different from any other DB?
I guess the question is more like: if I'm running ES under normal conditions, could index get corrupted?
If this is hardware issue, and you have replication switched on, then you wouldn't get affected much. Your system will continue functioning but state would become yellow. You'd need to replase the node and this is it.
Some people claimed, that they expirienced sudden index corruption with data loss. I myself nether saw anything like this. Even though I had done few times stupid things, and had near hart stroke feelings but data wasn't lost at the end, and again I have nothing to blame but myself.

Regarding stability I could say that ES has not gave us any problems. I was performing such things with success on production envirement with zero downtime:
- adding nodes and replication
- transitioning data to another data center
- adding more clients
Etc...

I'd really like to hear from people who expirienced data loss. If someone would provide details this would help us to understand that was wrong and what we should avoid doing.
But becides claims that there are such cases, I didn't hear anything else.

Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94e2e909-9b71-43a4-990e-964d528f2dd9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: May I use ES as DB to replace MongoDB?

lebowitz
This post has NOT been accepted by the mailing list yet.
I was talking about using ES as a system of record with my friendly IT director today.  We were brainstorming about how 'backup" would work.

The lucene index is immutable, so we can think about ES data as a transaction log.  We can recreate from _source at a given time with a scan/scroll archive of docs at an interval, e.g 1h.  This is exactly the same as backing up db transaction logs.
Reply | Threaded
Open this post in threaded view
|

Re: May I use ES as DB to replace MongoDB?

davrob
In reply to this post by Eugene Strokin
Hi Eugene,

Thanks for your comments - I'll do my best to explain where I am coming from, and to address some of the issues you have raised.

Firstly, where I'm coming from: the data I'm holding and searching against needs to be 100% backed up because it needs to be audited in the future.  For that reason the data is held on an old fashioned multi-master replicated relational DB.

In terms of the issues you raised:

1)   But how this is different from any other DB?  

 i) With relational DBs it is part of the strategy to replay the transaction logs to make up for any data that hasn't been backed up.  I've heard of people doing this with ES, but it is not documented well anywhere, additionally the transaction logs, to my limited understanding, are kept in the same area as the index files and can suffer corruption.  I think there may be some monitoring in version 1.0 to stop ES writing to disk before the files become corrupted, which would help.  But the first point, that there is no clear transaction log replay strategy outlined for elasticsearch.

ii) Multi-master replication - no doubt its possible to arrange JMS queues or hazelcast/coherence grids to do this - but a build in solution would be useful.

2)  Examples of data loss - upgrading elasticsearch versions, I've ended up losing all data,  no doubt through my own fault, and maybe I'd have been more careful, and read upgrade instructions more carefully if I'd have know that my data was not backed up in the relational database, but it is definitely something that plays on my mind: "If I screw up this upgrade process, or misunderstand the upgrade process then that's it my data is gone"

So, I would probably add the following, although I could be wrong, because I have not read every blog relating to ES upgrades:

1) But how this is different from any other DB?  
iii)  There is no clear, consistent, well documented process of upgrading elasticsearch versions, particularly when the underlying Lucene version changes.

David.

On Tuesday, 14 January 2014 20:13:22 UTC, Eugene Strokin wrote:
You are correct. But how this is different from any other DB?
I guess the question is more like: if I'm running ES under normal conditions, could index get corrupted?
If this is hardware issue, and you have replication switched on, then you wouldn't get affected much. Your system will continue functioning but state would become yellow. You'd need to replase the node and this is it.
Some people claimed, that they expirienced sudden index corruption with data loss. I myself nether saw anything like this. Even though I had done few times stupid things, and had near hart stroke feelings but data wasn't lost at the end, and again I have nothing to blame but myself.

Regarding stability I could say that ES has not gave us any problems. I was performing such things with success on production envirement with zero downtime:
- adding nodes and replication
- transitioning data to another data center
- adding more clients
Etc...

I'd really like to hear from people who expirienced data loss. If someone would provide details this would help us to understand that was wrong and what we should avoid doing.
But becides claims that there are such cases, I didn't hear anything else.

Eugene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5cff97f3-9541-4cba-a3c2-be0d8ad4440d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: May I use ES as DB to replace MongoDB?

lebowitz
I was talking about using ES as a system of record with my friendly IT director today.  We were brainstorming about how 'backup" would work.

The lucene index is immutable, so we can think about ES data as a transaction log.  We can recreate from _source at a given time with a scan/scroll archive of docs at an interval, e.g 1h.  This is exactly the same as backing up db transaction logs.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/efa02928-b2f6-4c0e-a5ce-faf212c3e638%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: May I use ES as DB to replace MongoDB?

joergprante@gmail.com
Note, there is a valuable snapshot/restore facility coming in ES 1.0.0, with incremental snapshots.


Jörg


On Thu, Jan 16, 2014 at 3:37 PM, Craig Lebowitz <[hidden email]> wrote:
I was talking about using ES as a system of record with my friendly IT director today.  We were brainstorming about how 'backup" would work.

The lucene index is immutable, so we can think about ES data as a transaction log.  We can recreate from _source at a given time with a scan/scroll archive of docs at an interval, e.g 1h.  This is exactly the same as backing up db transaction logs.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/efa02928-b2f6-4c0e-a5ce-faf212c3e638%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFO8GnHowQ5-p-yHCY8ytTkYXitXOgq%3DKu6gOohVcFGgg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.