Please, tell about the success story about ES usage on production

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Please, tell about the success story about ES usage on production

vsiryi
I want to convince my customer to use the ES. To do this I need
success stories about ES usage on production. I tried to find similar
information at the ElasticSearch official site and in Google but not
found.

I'll be very happy if you write how many documents you are indexing
and how many approximate number of request is served daily. Some links
to production application will be very helpful.

Thanks!

Best regards, Vitalii Siryi
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Alexander Reelsen
Hi

On Nov 30, 11:33 am, Vitalii Siryi <[hidden email]> wrote:
> I'll be very happy if you write how many documents you are indexing
> and how many approximate number of request is served daily. Some links
> to production application will be very helpful.
We are using it as a product search engine, you can see it live at
http://www.lusini.de

Document count is quite low, more than 200k documents, and currently
on one node. We never had performance issues. We are doing lots of
facetting queries with about ten facets and of course some filters -
queries get a bit slower but are still fast enough for us.

In order to get the product data into elasticsearch we have
implemented a river, which pulls every n seconds for updates and is
streaming JSON data, so we do not have to wait until we got all the
data, when downloading big data. We have several thousand updates a
day (more likely ten thousand).

Before elasticsearch we had a lightning fast but unmaintenable self
written solution based on bobo and zoie, which we switched in order to
have a more simpler solution, which is understood well by all
developers.

Of course we do not expose elasticsearch directly to browsers, we have
another component in between, which also can do stuff like redirecting
certain search terms to landing pages.
To be honest, I cannot tell, how much requests are coming in per day,
but I guess it is somewhat below 100k.

Hope this helps. In case of questions, feel free to ask.


--Alexander
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

James Cook-3
In reply to this post by vsiryi
http://www.penpalkidsclub.com/

Written using ES as the only form of persistence. Went live 7/2011.
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Michael Sick
James,

Was curious what settings/features you consider most important to configure/use when using ES without a secondary persistence mechanism. Perhaps this should be a separate thread - but I'm very curious what your experiences are here.

Thanks,
--Mike

On Wed, Nov 30, 2011 at 10:58 AM, James Cook <[hidden email]> wrote:
http://www.penpalkidsclub.com/

Written using ES as the only form of persistence. Went live 7/2011.

Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Andy-2
In reply to this post by Alexander Reelsen
Alexander,

What made bobo and zoie unmaintenable? How is elasticsearch more
maintenable?

You said the bobo/zoie solution "lightning fast." Was it significantly
faster than elasticsearch?

Thanks.

On Nov 30, 6:33 am, Alexander Reelsen
<[hidden email]> wrote:

> Hi
>
> On Nov 30, 11:33 am, Vitalii Siryi <[hidden email]> wrote:> I'll be very happy if you write how many documents you are indexing
> > and how many approximate number of request is served daily. Some links
> > to production application will be very helpful.
>
> We are using it as a product search engine, you can see it live athttp://www.lusini.de
>
> Document count is quite low, more than 200k documents, and currently
> on one node. We never had performance issues. We are doing lots of
> facetting queries with about ten facets and of course some filters -
> queries get a bit slower but are still fast enough for us.
>
> In order to get the product data into elasticsearch we have
> implemented a river, which pulls every n seconds for updates and is
> streaming JSON data, so we do not have to wait until we got all the
> data, when downloading big data. We have several thousand updates a
> day (more likely ten thousand).
>
> Before elasticsearch we had a lightning fast but unmaintenable self
> written solution based on bobo and zoie, which we switched in order to
> have a more simpler solution, which is understood well by all
> developers.
>
> Of course we do not expose elasticsearch directly to browsers, we have
> another component in between, which also can do stuff like redirecting
> certain search terms to landing pages.
> To be honest, I cannot tell, how much requests are coming in per day,
> but I guess it is somewhat below 100k.
>
> Hope this helps. In case of questions, feel free to ask.
>
> --Alexander
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Alexander Reelsen
Hi

On Dec 1, 2:24 am, Andy <[hidden email]> wrote:
> What made bobo and zoie unmaintenable? How is elasticsearch more
> maintenable?
The bobo/zoie implementation was developed by some JEE keen dev, which
meant it had tons of layers and a bad implementation in terms of
accessibility via HTTP - one servlet, one API call where everything
from facetted queries up to suggest was done by appending gazillions
of parameters.

I do not want to flame or disregard zoie or bobo here, they are good
tools, it was really our implementation which made us switch to
ES. :-)

The good part for us is, that we do not have to care that much about
the product - we only hacked a river on top of it. Much less code to
care about for us. Makes it more maintainable after all and every
developer in the team understands our search solution without digging
into lucene/bobo/zoie internals.

> You said the bobo/zoie solution "lightning fast." Was it significantly
> faster than elasticsearch?
Facetting is real fast with lots of data when using bobo. However as
we do not have that much data in one index, we dont care. We are more
than happy with ES speed.


--Alexander
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Andy-2
I see.

Did you look at Sensei? It's a search engine built using bobo and
zoie. Just wondered if Sensei is easier to use.

On Dec 1, 2:58 am, Alexander Reelsen
<[hidden email]> wrote:

> Hi
>
> On Dec 1, 2:24 am, Andy <[hidden email]> wrote:> What made bobo and zoie unmaintenable? How is elasticsearch more
> > maintenable?
>
> The bobo/zoie implementation was developed by some JEE keen dev, which
> meant it had tons of layers and a bad implementation in terms of
> accessibility via HTTP - one servlet, one API call where everything
> from facetted queries up to suggest was done by appending gazillions
> of parameters.
>
> I do not want to flame or disregard zoie or bobo here, they are good
> tools, it was really our implementation which made us switch to
> ES. :-)
>
> The good part for us is, that we do not have to care that much about
> the product - we only hacked a river on top of it. Much less code to
> care about for us. Makes it more maintainable after all and every
> developer in the team understands our search solution without digging
> into lucene/bobo/zoie internals.
>
> > You said the bobo/zoie solution "lightning fast." Was it significantly
> > faster than elasticsearch?
>
> Facetting is real fast with lots of data when using bobo. However as
> we do not have that much data in one index, we dont care. We are more
> than happy with ES speed.
>
> --Alexander
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Alexander Reelsen
Hey

On Dec 1, 9:51 am, Andy <[hidden email]> wrote:
> I see.
>
> Did you look at Sensei? It's a search engine built using bobo and
> zoie. Just wondered if Sensei is easier to use.
I didnt know about it, when we started investigating elasticsearch.

Also, it looks more complex as you have to do more manual tasks to get
it up and running (i.e. zookeeper). Most fatal, I read the term
"schema" several times in the documentation, did not like that :-)


--Alexander
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

James Cook-3
In reply to this post by Michael Sick
Hi Michael,

My biggest worries are:
  • Backup/Restore
  • Split Brain (really, this is my number one concern. Very destructive and almost no way to recover.)
Take a look at these threads:
  • http://goo.gl/xIXkj
  • http://goo.gl/Z7DKO

-- jim
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Michael Sick
James,

I'm with you on the Backup/Restore. Shay has indicated that it's a high priority. We need it to satisfy enterprisy type customers that always want a stable offsite backup. It would also be a great way to manage pushing/pulling time based indexes from a cluster. We're likely to have an index/day and would like to roll them off the back-end after N days. When a user wants to see data past the N day threshold, it would be nice to simply request the daily file from the tape backup system and import it back into the system. We can accomplish the same thing with exports of the _source field or even of the original document (XML in our case and we will likely backup both) but having indexes at the ready would be very slick.

Not sure I understand the split brain issue but I'll doing some reading up. 
--Mike

On Fri, Dec 2, 2011 at 12:43 AM, James Cook <[hidden email]> wrote:
Hi Michael,

My biggest worries are:
  • Backup/Restore
  • Split Brain (really, this is my number one concern. Very destructive and almost no way to recover.)
Take a look at these threads:

-- jim

Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Drew Raines-2
In reply to this post by James Cook-3
James Cook wrote:

> * Split Brain (really, this is my number one concern. Very destructive
>   and almost no way to recover.)

I suggest you try ZooKeeper discovery.  It should make split-brain
difficult to encounter.

https://github.com/elasticsearch/elasticsearch/pull/1057

-Drew
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Karussell
In reply to this post by Michael Sick


On 2 Dez., 17:08, Michael Sick <[hidden email]>
wrote:
> James,
>
> I'm with you on the Backup/Restore. Shay has indicated that it's a high
> priority. We need it to satisfy enterprisy type customers that always want
> a stable offsite backup. It would also be a great way to manage
> pushing/pulling time based indexes from a cluster. We're likely to have an
> index/day and would like to roll them off the back-end after N days.

Rolling off means in your case delete from disc or avoid searching on
them?

Here is some code to do rolling indices:
https://github.com/elasticsearch/elasticsearch/issues/1500

Then after flushing it even should be safe to rsync them into another
location + get them back.

Regards,
Peter.
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Michael Sick
Peter,

Yes, rolling off means that the index for a given day has become older than our current online window and is eligible for archiving on tape or another remote location not available to the cluster. So say we're keeping daily indexes for 100 days, on day 101 for an index it can be backed up and sent to tape.

Thanks for the pointer, a few questions:

1) Are you using Index Templates with this method?
2) After an index is flushed (and even closed), from where do we reliably copy it and make sure we got all of the needed parts/shards. 
3) Are you using this in a production system? Just curious how it's working out.

Thanks for the response! --Mike

On Fri, Dec 2, 2011 at 3:52 PM, Karussell <[hidden email]> wrote:


On 2 Dez., 17:08, Michael Sick <[hidden email]>
wrote:
> James,
>
> I'm with you on the Backup/Restore. Shay has indicated that it's a high
> priority. We need it to satisfy enterprisy type customers that always want
> a stable offsite backup. It would also be a great way to manage
> pushing/pulling time based indexes from a cluster. We're likely to have an
> index/day and would like to roll them off the back-end after N days.

Rolling off means in your case delete from disc or avoid searching on
them?

Here is some code to do rolling indices:
https://github.com/elasticsearch/elasticsearch/issues/1500

Then after flushing it even should be safe to rsync them into another
location + get them back.

Regards,
Peter.

Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

Ævar Arnfjörð Bjarmason
In reply to this post by Michael Sick
On Fri, Dec 2, 2011 at 17:08, Michael Sick
<[hidden email]> wrote:
> I'm with you on the Backup/Restore. Shay has indicated that it's a high
> priority. We need it to satisfy enterprisy type customers that always want a
> stable offsite backup.

I setup a production setup of ES that does tens of millions of queries
per day. And I solve this by not having ES be the primary datastore
for anything, it's just treated as a specialized index.

I.e. the primary datastore is data scattered through various RDMSs,
then I have a cronjob that does daily aggregations of all that data
into a flat daily rotating table that'll become the ElasticSearch
index.

Then to populate the index I effectively do a SELECT * from that table
and inject into a new daily ES index via the bulk api.

This means that:

 * In an organization that's used to managing production data via
   RDMSs there's no new store of production data, just a specialized
   index.

 * The ES index can be nuked at any time and we can resume search
   operations in the time it would take to run that SELECT * > ES
   cronjob. Currently that's around 10 minutes.

 * We don't have to set up anything new to backup / manage the
   data. E.g. we have a regular snapshots of production data that are
   moved to dev environments. The snapshot just copies the RDMSs, and
   then a cronjob in the dev environment populates the dev
   ElasticSearch index (which'll by definition by equivalent to
   production).

Now in my case the ElasticSearch dataset isn't that large (it
comfortably fits in RAM on one machine), and I only generate new
indexes daily, but I don't see any inherent reason for why this
strategy couldn't be adapted for larger data / data that's changing
all the time.

Setting it up like this did a lot to alleviate concerns about
introducing new technology in my organization.
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

James Cook-3
In reply to this post by Drew Raines-2
I've read the pull request, but I have no experience with ZooKeeper.

ZooKeeper uses a fixed list of ZooKeeper nodes, so it’s quite easy for it to decide if quorum is present or not.

Does this comment mean I have to have a few nodes dedicated to just running zookeeper, or does it mean my application nodes are fixed? Because I have no fixed nodes. Amazon manages my instances for me and its services will create new nodes when demand is high, and destroy nodes when demand lessens. I don't know the IPs of these nodes, nor do they hard disks (EBS on AWS). 



Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

kimchy
Administrator
Note that with zookeeper you still have split brains, you just get to a state of no availability when it happens (as far as I know). You can get to similar behavior with the minimum_master_nodes setting in elasticsearch discovery (thats not to say that a zookeeper discovery module is not cool).

On Mon, Dec 5, 2011 at 6:05 AM, James Cook <[hidden email]> wrote:
I've read the pull request, but I have no experience with ZooKeeper.

ZooKeeper uses a fixed list of ZooKeeper nodes, so it’s quite easy for it to decide if quorum is present or not.

Does this comment mean I have to have a few nodes dedicated to just running zookeeper, or does it mean my application nodes are fixed? Because I have no fixed nodes. Amazon manages my instances for me and its services will create new nodes when demand is high, and destroy nodes when demand lessens. I don't know the IPs of these nodes, nor do they hard disks (EBS on AWS). 




Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

James Cook-3
Can you get to "no availability" using minimum_master_nodes when you have a totally dynamic collection of nodes? (I don't know how many will be created/destroyed by external manager to handle load.)
Reply | Threaded
Open this post in threaded view
|

Re: Please, tell about the success story about ES usage on production

ppearcy
In reply to this post by vsiryi
A little late on this thread, but figured I'd share my experience. We
were able to replace two enterprise search systems. One was a legacy
that we wrote a emulation layer on top of to act as a drop in
replacement. The other search system was costing way too much money
and the level of support for issues I ran into was very poor, even
with harassing people on a daily basis, and the performance wasn't
that good, after jumping through some hoops on my side to optimize.

We compared elasticsearch to solr back in fall of 2010 and at that
time elasticsearch had many compelling features that differentiated it
from Solr. Without tuning anything, elasticsearch was 10x faster. I
actually assumed by tests were broken. Now, I could probably have
gotten solr to the same performance level, but why go through the
effort?

In summary:
- elasticsearch saved my company probaly 50K / year
- It improved performance from the systems I replaced by 10x
- Enabled lots of new features we didn't previously had
- Shay and others on the discussion groups provide a great level of
support.
- Scales horizontally... just throw new servers into the cluster to
add capacity

We've had a couple of hiccups around network partitions. Early
versions could nuke some data. 0.16 fixed most of these issues, but we
still had a few indices corrupted on this release after a major
network event.

Best Regards,
Paul

On Nov 30, 3:33 am, Vitalii Siryi <[hidden email]> wrote:

> I want to convince my customer to use theES. To do this I needsuccessstories aboutESusageonproduction. I tried to find similar
> information at the ElasticSearch official site and in Google but not
> found.
>
> I'll be very happy if you write how many documents you are indexing
> and how many approximate number of request is served daily. Some links
> toproductionapplication will be very helpful.
>
> Thanks!
>
> Best regards, Vitalii Siryi