Does it make sense to index whole document

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Does it make sense to index whole document

Janusz Dalecki

Hi,

I have created my river for my MongoDB Asset collection that should be indexed by ElasticSearch.

To do that I have used the following command:

# Create river for Asset collection


http://localhost:9200/_river/asset_river/_meta -d '{

 "type": "mongodb",

   "mongodb": {

     "db": "my_database",

     "collection": "Asset"

   },

   "index": {

     "name": "asset_index",

     "type": "Asset"

   }

}'


I think this will basically index every field in all my Asset documents – is that right?.

Does it make sense to do that – if I understand it I will have so many indexes that the searching might be as slow as just searching through the documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Does it make sense to index whole document

Ivan Brusic
You can supply a custom mapping for that index BEFORE the index is created, denoting which fields should and should not be indexed. You cannot change the source document, but you do have control over what fields are indexed.

-- 
Ivan


On Mon, Aug 12, 2013 at 7:57 PM, JD <[hidden email]> wrote:

Hi,

I have created my river for my MongoDB Asset collection that should be indexed by ElasticSearch.

To do that I have used the following command:

# Create river for Asset collection


http://localhost:9200/_river/asset_river/_meta -d '{

 "type": "mongodb",

   "mongodb": {

     "db": "my_database",

     "collection": "Asset"

   },

   "index": {

     "name": "asset_index",

     "type": "Asset"

   }

}'


I think this will basically index every field in all my Asset documents – is that right?.

Does it make sense to do that – if I understand it I will have so many indexes that the searching might be as slow as just searching through the documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Does it make sense to index whole document

Ivan Brusic
Slight correction in my last email. You can create the index with a custom mapping BEFORE the river is created. I said before the index is created, which is wrong.

-- 
Ivan


On Tue, Aug 13, 2013 at 4:12 PM, Ivan Brusic <[hidden email]> wrote:
You can supply a custom mapping for that index BEFORE the index is created, denoting which fields should and should not be indexed. You cannot change the source document, but you do have control over what fields are indexed.

-- 
Ivan


On Mon, Aug 12, 2013 at 7:57 PM, JD <[hidden email]> wrote:

Hi,

I have created my river for my MongoDB Asset collection that should be indexed by ElasticSearch.

To do that I have used the following command:

# Create river for Asset collection


http://localhost:9200/_river/asset_river/_meta -d '{

 "type": "mongodb",

   "mongodb": {

     "db": "my_database",

     "collection": "Asset"

   },

   "index": {

     "name": "asset_index",

     "type": "Asset"

   }

}'


I think this will basically index every field in all my Asset documents – is that right?.

Does it make sense to do that – if I understand it I will have so many indexes that the searching might be as slow as just searching through the documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Does it make sense to index whole document

Janusz Dalecki
In reply to this post by Ivan Brusic

Hi Ivan,

  1.         So would you agree with me that indexing all fields (columns) in all documents is an overkill?
  2.         Is there a comprehensive document how to set what field is to be analysed – do I have to do it for every field? Is there a global flag I can set to not analyse field and then set individual fields to be analysed?

Regards,

Janusz


On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:
You can supply a custom mapping for that index BEFORE the index is created, denoting which fields should and should not be indexed. You cannot change the source document, but you do have control over what fields are indexed.

-- 
Ivan


On Mon, Aug 12, 2013 at 7:57 PM, JD <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="8cfyjt12RK8J">jdal...@...> wrote:

Hi,

I have created my river for my MongoDB Asset collection that should be indexed by ElasticSearch.

To do that I have used the following command:

# Create river for Asset collection


http://localhost:9200/_river/asset_river/_meta -d '{

 "type": "mongodb",

   "mongodb": {

     "db": "my_database",

     "collection": "Asset"

   },

   "index": {

     "name": "asset_index",

     "type": "Asset"

   }

}'


I think this will basically index every field in all my Asset documents – is that right?.

Does it make sense to do that – if I understand it I will have so many indexes that the searching might be as slow as just searching through the documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="8cfyjt12RK8J">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Does it make sense to index whole document

Ivan Brusic
Hi Janusz,

Answers inline.


On Tue, Aug 13, 2013 at 5:38 PM, JD <[hidden email]> wrote:

Hi Ivan,

  1.         So would you agree with me that indexing all fields (columns) in all documents is an overkill?

The biggest downside is that your overall index size will be bigger. If your Lucene index does not fit into memory, elasticsearch could swap to disk more frequently. Field/filter caches will not be bigger if the fields are not queried (ignoring the _all field for now).

  1.         Is there a comprehensive document how to set what field is to be analysed – do I have to do it for every field?
It is up to your application about which fields should be indexed or not. Not analyzed and not indexed are two different things. Not analyzed means a field is indexed, but does not go through the tokenization/filtering process. Providing your custom mapping will not only mark which fields should be indexed, but also how they should be analyzed. By default, indexed fields uses the Standard analyzer, but you will soon discover that certain fields require a different analyzer or not be analyzed at all (while still being indexed).
  1. Is the a global flag I can set to not analyse field and then set individual fields to be analysed?
Look into dynamic templates:


Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Does it make sense to index whole document

Janusz Dalecki
In reply to this post by Ivan Brusic
Hi,
So in this simple mapping example below:
{
  "Asset" : {
  "_all":{
         "analyzer":"english"
    },
      "assetCategoryId" : {
        "type" : "long", indexed : false
      },
 "className" : {
        "type" : "string"
      }
}
}

I have added 'indexed' property set to false for the filed 'assetCategoryId'.
Is this correct what I have done. From now on  'assetCategoryId' the should not be indexed - as I never search on that field.
Regards,
Janusz

On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:
You can supply a custom mapping for that index BEFORE the index is created, denoting which fields should and should not be indexed. You cannot change the source document, but you do have control over what fields are indexed.

-- 
Ivan


On Mon, Aug 12, 2013 at 7:57 PM, JD <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="8cfyjt12RK8J">jdal...@...> wrote:

Hi,

I have created my river for my MongoDB Asset collection that should be indexed by ElasticSearch.

To do that I have used the following command:

# Create river for Asset collection


http://localhost:9200/_river/asset_river/_meta -d '{

 "type": "mongodb",

   "mongodb": {

     "db": "my_database",

     "collection": "Asset"

   },

   "index": {

     "name": "asset_index",

     "type": "Asset"

   }

}'


I think this will basically index every field in all my Asset documents – is that right?.

Does it make sense to do that – if I understand it I will have so many indexes that the searching might be as slow as just searching through the documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="8cfyjt12RK8J">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Does it make sense to index whole document

Ivan Brusic
The setting to have a field not indexed is "index": "no".

"assetCategoryId" : { "type" : "long", "index": "no" }

The className attribute is using the default settings, so you can even exclude it if you want. Depends if you like explicit or concise mappings (I prefer the former, just like you have it now).

Cheers,

Ivan


On Tue, Aug 13, 2013 at 8:02 PM, JD <[hidden email]> wrote:
Hi,
So in this simple mapping example below:
{
  "Asset" : {
  "_all":{
         "analyzer":"english"
    },
      "assetCategoryId" : {
        "type" : "long", indexed : false
      },
 "className" : {
        "type" : "string"
      }
}
}

I have added 'indexed' property set to false for the filed 'assetCategoryId'.
Is this correct what I have done. From now on  'assetCategoryId' the should not be indexed - as I never search on that field.
Regards,
Janusz

On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:
You can supply a custom mapping for that index BEFORE the index is created, denoting which fields should and should not be indexed. You cannot change the source document, but you do have control over what fields are indexed.

-- 
Ivan


On Mon, Aug 12, 2013 at 7:57 PM, JD <[hidden email]> wrote:

Hi,

I have created my river for my MongoDB Asset collection that should be indexed by ElasticSearch.

To do that I have used the following command:

# Create river for Asset collection


http://localhost:9200/_river/asset_river/_meta -d '{

 "type": "mongodb",

   "mongodb": {

     "db": "my_database",

     "collection": "Asset"

   },

   "index": {

     "name": "asset_index",

     "type": "Asset"

   }

}'


I think this will basically index every field in all my Asset documents – is that right?.

Does it make sense to do that – if I understand it I will have so many indexes that the searching might be as slow as just searching through the documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Does it make sense to index whole document

Janusz Dalecki
In reply to this post by Ivan Brusic

Hi,

Thank you Ivan very much for all the info you have provided in this thread.

Just one more thing - is there a documentation on this syntax - I have tried to find out what type of fields I can put in mapping file and what are the valid values, but I couldn't.

Regards,

Janusz


On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:
You can supply a custom mapping for that index BEFORE the index is created, denoting which fields should and should not be indexed. You cannot change the source document, but you do have control over what fields are indexed.

-- 
Ivan


On Mon, Aug 12, 2013 at 7:57 PM, JD <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="8cfyjt12RK8J">jdal...@...> wrote:

Hi,

I have created my river for my MongoDB Asset collection that should be indexed by ElasticSearch.

To do that I have used the following command:

# Create river for Asset collection


http://localhost:9200/_river/asset_river/_meta -d '{

 "type": "mongodb",

   "mongodb": {

     "db": "my_database",

     "collection": "Asset"

   },

   "index": {

     "name": "asset_index",

     "type": "Asset"

   }

}'


I think this will basically index every field in all my Asset documents – is that right?.

Does it make sense to do that – if I understand it I will have so many indexes that the searching might be as slow as just searching through the documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="8cfyjt12RK8J">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: Does it make sense to index whole document

Ivan Brusic
I meant to include some links, but I forgot.

The core type documentation information on the settings: http://www.elasticsearch.org/guide/reference/mapping/core-types/

Elasticsearch is built on top of Lucene, so all of the analysis concepts such as analyzers, tokenizers and filters are also explained in Lucene documentation. That said, Lucene's documentation is not very good. Solr is also built on top of Lucene, and has decent docs: http://docs.lucidworks.com/display/solr/Apache+Solr+Reference+Guide

Cheers,

Ivan



On Wed, Aug 14, 2013 at 4:45 PM, JD <[hidden email]> wrote:

Hi,

Thank you Ivan very much for all the info you have provided in this thread.

Just one more thing - is there a documentation on this syntax - I have tried to find out what type of fields I can put in mapping file and what are the valid values, but I couldn't.

Regards,

Janusz


On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:
You can supply a custom mapping for that index BEFORE the index is created, denoting which fields should and should not be indexed. You cannot change the source document, but you do have control over what fields are indexed.

-- 
Ivan


On Mon, Aug 12, 2013 at 7:57 PM, JD <[hidden email]> wrote:

Hi,

I have created my river for my MongoDB Asset collection that should be indexed by ElasticSearch.

To do that I have used the following command:

# Create river for Asset collection


http://localhost:9200/_river/asset_river/_meta -d '{

 "type": "mongodb",

   "mongodb": {

     "db": "my_database",

     "collection": "Asset"

   },

   "index": {

     "name": "asset_index",

     "type": "Asset"

   }

}'


I think this will basically index every field in all my Asset documents – is that right?.

Does it make sense to do that – if I understand it I will have so many indexes that the searching might be as slow as just searching through the documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.