Elasticsearch

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Elasticsearch

thomasmueller2@hushmail.com
Hello,

We had a look on GSA and OpenSearchServer and are wondering if elasticsearch is something similar (standalone fulltextretrieval system),
e.g. where the enduser can enter searchterms in a browser based search mask?


Thank you for any feedback!

Thomas

Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch

Michael Sick
Hi Thomas,

Will help the best that I can. First, I'm assuming GSA is Google Search Appliance vs. you were looking for solutions on the General Services Administration (which could make sense with the word "on"). 

ElasticSearch is best described on the home page "Distributed, RESTful, Search Engine built on top of Apache Lucene". It is not an Appliance like GSA - it also is not a crawler. 

Vs OSS, from the OSS page I see:
  • A crawler allows you to index web pages, documents from files on local and remote systems and contents from any JDBC Database, such as Oracle, MySql, and Microsoft SQL Server and more,
ES does not have crawler modules AFAIK. ES is built for near real-time indexing and retrieval. The projects that I've done and seen focus less on nightly updates in batch than they do continuous feeds. ES' abstraction for this type of work is the River API - see http://www.elasticsearch.org/guide/reference/river/ as there are Rivers for several data sources. Most often, people just use the Index API and submit content from their application.
  • Full text analyzers and filters allowing optimized and efficient searches in 16 languages and indexing performance,
Yes. ES leverages Lucene which is a very mature text analysis / search system. I'd be surprised (thought it's happened before) to see ES/Lucene be outdone on the search basics.
  • An indexer that creates, updates the index and presents the answers to queries using the most efficient algorithms for best performance and response times,
Yes ES has an indexer and yes it is fast.
  • Html renderer allowing an easy integration of the Search box in an html/xhtml page, working with php and .Net, client library and xml over http API.
No. ES works on REST using JSON and I'd argue that the approach has some benefits to not incorporating a 3rd party HTML requirement into your application. Likely just a style preference but where I can I like to build open.
  • Parsers allowing you to get content and metadata from most documents and formats, such as MS Office, OpenOffice, html/xhtml, xml, Adobe pdf, rtf, txt, mp3/4, wav, torrents and more.
Yes to some. ES leverages Apache Tika for parsing documents. Not sure what's available for audio files.
  • A series of caches to accelerate processes and deliver powerful search applications,
Yes. See the Filter API - it has quite powerful parsing and I've found ES's performance to be solid and it holds up under continuous updates to the index.
  • A monitoring and administration module offering an alerting service which checks that your index is always updated and working well and that the necessary hardware resources are available.
ES can integrate with monitoring solutions directly or via JMX. The monitoring is "good enough" - I'd like to see more bright, shiny monitoring tools going forward.
  • An integrated Scheduler service can be used to create simple or complex jobs and run them automatically.
No. Not sure why an internal scheduler would be an advantage. ES does keep track of TTL (Time to Live) for submitted documents and will delete/evict on a schedule.
  • Comprehensive online documentation to provide you all the help you might need when learning to use features and creating your applications,
Wish there was an ES book but the documentation works.
  • Advanced functionality: faceting, clustering, filters, snippets, synonyms, stopwords, categorization, “find similar”, automatic thumbnail screenshot inclusion,
Facets - yes, quite strong. Filters/Snippets - not sure what is being described. Synonyms/Stop Words/Similar - yes, in Lucene.  Thumbnailing - I don't think so.
  • An OpenSearchServer Drupal Module, a Wordpress module.
No but there are PHP clients and the core API is REST/JSON so there's nothing you can't do in PHP.

Hope this helps --Mike


On Mon, Mar 12, 2012 at 2:55 PM, [hidden email] <[hidden email]> wrote:
Hello,

We had a look on GSA and OpenSearchServer and are wondering if elasticsearch is something similar (standalone fulltextretrieval system),
e.g. where the enduser can enter searchterms in a browser based search mask?


Thank you for any feedback!

Thomas