[just pushed] _all field

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[just pushed] _all field

kimchy
Administrator
Just pushed support for _all field. It is documented here: http://github.com/elasticsearch/elasticsearch/issues/issue/63. The all field is basically a field that includes one or more document fields, allowing, for example, to simply search on all the document content easily with a queryString query. One can easily disable it, or pick and choose which fields end up in the _all field. 

But, there are no free bunnies in software, and enabled _all means more CPU cycles when indexing, and larger index (but hey, were distributed right? Just Add Machines(tm) ). I believe all is a very important, especially when talking about rich documents, and not simple ones with "title" and "content".

The decision currently is to enable _all by default, and have all the fields included in _all by default as well. This means that the initial user experience would be very good in terms of usability. But, in terms of performance when indexing, it will be slower (how much slower? really depends on the document and  such). What do you think? Does this default make sense?

-shay.banon

Reply | Threaded
Open this post in threaded view
|

Re: _all field

egaumer
On Mar 16, 5:11 pm, Shay Banon <[hidden email]> wrote:

> Just pushed support for _all field. It is documented here:http://github.com/elasticsearch/elasticsearch/issues/issue/63. The all field
> is basically a field that includes one or more document fields, allowing,
> for example, to simply search on all the document content easily with a
> queryString query. One can easily disable it, or pick and choose which
> fields end up in the _all field.
>
> But, there are no free bunnies in software, and enabled _all means more CPU
> cycles when indexing, and larger index (but hey, were distributed right?
> Just Add Machines(tm) ). I believe all is a very important, especially when
> talking about rich documents, and not simple ones with "title" and
> "content".
>
> The decision currently is to enable _all by default, and have all the fields
> included in _all by default as well. This means that the initial user
> experience would be very good in terms of usability. But, in terms of
> performance when indexing, it will be slower (how much slower? really
> depends on the document and  such). What do you think? Does this default
> make sense?

Hey Shay, I think the default setting here is inline with the
ElasticSearch mentality of "it just works". If you're a savvy user
then the ability to disable this feature is great but novice users
should be able to fire up an instance and get the search they'd expect
from a Google like experience.

With that said, I've been working in this space for about 6 years for
Fortune Global 500 companies (mainly with commercial search vendors
but also some Lucene work). Every commercial search vendor provides
some sort of composite field and I completely agree that this is a
must have feature in ElasticSearch. The ability to select which fields
belong to the composite is equally important but including "all" by
default seems reasonable.

I absolutely love the work you've put into ElasticSearch and once I
get some time to really investigate the architecture, I plan on
providing some help. I think too many folks are thinking about the
"big data" problem in terms of storage and forgetting about inherent
searchability. These data storage systems need an embedded search
layer similar to what's been done with TerraStore.

Awesome work and great vision.

Regards,
-Eric