What is the best way to store (changing) attributes?

classic Classic list List threaded Threaded
7 messages Options
joa
Reply | Threaded
Open this post in threaded view
|

What is the best way to store (changing) attributes?

joa

I want to store and query different attributes per document. Something like this:

{
    name: "doc1",
    metadata: [
        { color: "red" },
        { data: [ "value1", "value2", "value3" ] },
        { size: 500 },
        { avail: true },
    ]
},
...
{
    name: "doc4980",
    metadata: [
        { otherValues: [ 55, 33 ] },
        { important: true },
    ]
}

The metadata array may be different for lots of documents, as its entries will be defined by the user whenever a new attribute is needed.

Using the attribute name as field name (JSON left side) may lead to a high memory usage, so I put the names to the JSON right side, too. But I think the following will not work, because of the different types (int, string, ...) of the value (v) field:

"_source" : {
    "name": "doc4980",
    "metadata":[
        {
            k: "otherValues", 
v: [ 55, 33 ] }, { k: "important",
v: true } ] }




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to store (changing) attributes?

ppearcy
Hi, 
  Your first structure makes more sense, but I would probably make metadata an object instead of an array. 

Your second approach would work OK if:
- There is a single way you want to search these values
- You set up multi-fields on values and analyze each value multiple ways. There would likely be hiccups around numbers and dates. 

It really comes down to the search requirements on that data, though. 

Could you elaborate more on the memory issues you are running into? I'm not sure why these two structures would have memory profiles that differed by much.

Thanks,
Paul



On Monday, September 9, 2013 7:55:46 AM UTC-4, joa wrote:

I want to store and query different attributes per document. Something like this:

{
    name: "doc1",
    metadata: [
        { color: "red" },
        { data: [ "value1", "value2", "value3" ] },
        { size: 500 },
        { avail: true },
    ]
},
...
{
    name: "doc4980",
    metadata: [
        { otherValues: [ 55, 33 ] },
        { important: true },
    ]
}

The metadata array may be different for lots of documents, as its entries will be defined by the user whenever a new attribute is needed.

Using the attribute name as field name (JSON left side) may lead to a high memory usage, so I put the names to the JSON right side, too. But I think the following will not work, because of the different types (int, string, ...) of the value (v) field:

"_source" : {
    "name": "doc4980",
    "metadata":[
        {
            k: "otherValues", 
v: [ 55, 33 ] }, { k: "important",
v: true } ] }




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
joa
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to store (changing) attributes?

joa
Hi, I rejected the first structure due to this comment on one of my other questions: https://groups.google.com/d/msg/elasticsearch/pUg9GbDOMf8/QlKPkftm3e4J. What is the advantage of using an object instead of an array as you suggested? Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to store (changing) attributes?

ppearcy
Using an object is simply a more straightforward abstraction, likely, no practical difference. 

From that thread, I see there are two distinctions:
- Lots of document types leading to higher memory. I personally don't use lots of types, so can't really comment one way or the other. 
- Lots of document fields leading to a large number of field names in the index. I've had no issue with mappings with ~50 fields and I'd surprised if this really became a pain point. This is really just config that gets applied during search/indexing and each different field gets it's own inverted index(that is likely where most overhead comes from). Have you tested around this and had issues? 

How many distinct metadata names (keys) do you expect to have? There is no clear cut number where things will start to have issues, and I'm sure it depends on amount of resources in the cluster. I wouldn't be concerned at having less then 100 fields. I'd really test around more than that to see where things start to take a nose dive. 

There are definitely trade-offs with both approaches, but using the nested k/v approach ends up being more complicated and is the route I would go when evaling other possibilities fails.

Thanks,
Paul

On Monday, September 9, 2013 5:35:31 PM UTC-4, joa wrote:
Hi, I rejected the first structure due to this comment on one of my other questions: https://groups.google.com/d/msg/elasticsearch/pUg9GbDOMf8/QlKPkftm3e4J. What is the advantage of using an object instead of an array as you suggested? Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
joa
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to store (changing) attributes?

joa
The number of fields can be 1500 (!) and higher. (That comes through e.g. 100 projects and 15 metadata entries per project.) I am trying to find a robust and scaleable structure, which scales out fine later on.

I think the biggest problem with the k/v approach is, that the values can only be saved as strings and I cannot do queries like greater or between. On the other hand, having 1500 (and more) different fields feels wrong?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to store (changing) attributes?

ppearcy
If it were me, I would test each setup and to make sure you are running with the right approach. 

To handle your concern with k/v, I would have the value be an object, and then have string, date, numeric attributes with the appropriate analyzers. When you index your data, you will need to take a look at the value, and set the fan out values correctly, eg:
{
	k: "NumericData", 
	v: { 
		number: 1234
	}
}

{
	k: "StringData", 
	v: { 
		string: "Some user text"
	}
}

You'll have to have smarts on the query side to span these fields or target specific ones (dates/numbers). Don't forget to index this and query the k/v setup using the nested documents. 
Hope this helps.
Best Regards,
Paul
On Monday, September 9, 2013 6:15:12 PM UTC-4, joa wrote:
The number of fields can be 1500 (!) and higher. (That comes through e.g. 100 projects and 15 metadata entries per project.) I am trying to find a robust and scaleable structure, which scales out fine later on.

I think the biggest problem with the k/v approach is, that the values can only be saved as strings and I cannot do queries like greater or between. On the other hand, having 1500 (and more) different fields feels wrong?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
joa
Reply | Threaded
Open this post in threaded view
|

Re: What is the best way to store (changing) attributes?

joa
Thanks for your help! I've already started testing both variants with dummy data. Except from how complexity differs from variant to variant, how can I first test that i am using the most memory-friendly version?  

Do I need to compare the different results of http://localhost:9200/_stats?pretty and http://localhost:9200/_nodes?all=true&pretty?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.