Best approach for weighting tags on a document

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Best approach for weighting tags on a document

Chris Greening
Hi All,

I've been thinking about how to index document tags - for example a user might add a tag to document to indicate what the document is about.

I'm assuming that the best way to index these tags would be to have an array of tags.

What I'd like to do though is have some way of weighting the tags - so if multiple people add the same tag it becomes more relevant than a tag that just a few people have added.

What's the best way of doing this?

Thanks
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Best approach for weighting tags on a document

dadoonet
Do you want to display a Tag cloud?
You can use a Terms facet to get the top10 tags.

Is it what you are looking for?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 juin 2013 à 12:22, Chris Greening <[hidden email]> a écrit :

Hi All,

I've been thinking about how to index document tags - for example a user might add a tag to document to indicate what the document is about.

I'm assuming that the best way to index these tags would be to have an array of tags.

What I'd like to do though is have some way of weighting the tags - so if multiple people add the same tag it becomes more relevant than a tag that just a few people have added.

What's the best way of doing this?

Thanks
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Best approach for weighting tags on a document

Chris Greening
I was thinking more along the lines of something like this

doc1
tags: shoes^10, green^5, mens^2

doc2
tags: shoes^20, green^1, turquoise^6

So the tags would be given a higher weighting based on the number of people who had tagged the item with the same tag value.

If I did a search for "green shoes" then doc1 would come before doc2 in the search results.

Cheers
Chris.

On Sunday, June 16, 2013 12:55:38 PM UTC+1, David Pilato wrote:
Do you want to display a Tag cloud?
You can use a Terms facet to get the top10 tags.

Is it what you are looking for?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 juin 2013 à 12:22, Chris Greening <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="kwJ0ySR8BDIJ">cmgre...@...> a écrit :

Hi All,

I've been thinking about how to index document tags - for example a user might add a tag to document to indicate what the document is about.

I'm assuming that the best way to index these tags would be to have an array of tags.

What I'd like to do though is have some way of weighting the tags - so if multiple people add the same tag it becomes more relevant than a tag that just a few people have added.

What's the best way of doing this?

Thanks
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="kwJ0ySR8BDIJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Best approach for weighting tags on a document

q42jaap
I think you need a custom_score query for every term in the search string.

The custom_score can then take a script to extract the score from the document. You can use the mvel-scripting language to do this, but I've never done that before.

I'd store the values like this:

{
  tags: [
    { key:"shoes", value: 20 },
    { key:"green", value: 5 }
  ]
}

The custom_score for 1 term could look like this:

{
  custom_score : {
    "script" : ????, // should result in the value that belongs to shoes
    "query" : {
      "filtered" : {
        "filter" : {
          term : { "tags.key" : "shoes" }
        }
      }
    }
  }
}

This query uses a filter to make use of caching. If you have more than 1 term, you should combine them in a bool query, this will also combine the scoring for the individual custom_queries.
The mvel script will be a bit verbose probably, you have to foreach over the tags and check to see if the search term matches the key.

note: mvel scripting can be parameterized which you should always do. But since you'll have to write a foreach, it will be very slow, be warned.


An optimization could be the following format to store the boosting:

{
  tags : ["shoes", "green", "mens"]
  tags_score: {
    "shoes" : 20,
    "green" : 5,
    "mens" : 1
  }
}

that way you can filter on "tags", and fetch the scores with a map lookup:
Again, I haven't done anything with mvel myself, so I'm not sure this MapAccess is even supported...

Good luck,

Jaap


On Sunday, June 16, 2013 4:38:49 PM UTC+2, Chris Greening wrote:
I was thinking more along the lines of something like this

doc1
tags: shoes^10, green^5, mens^2

doc2
tags: shoes^20, green^1, turquoise^6

So the tags would be given a higher weighting based on the number of people who had tagged the item with the same tag value.

If I did a search for "green shoes" then doc1 would come before doc2 in the search results.

Cheers
Chris.

On Sunday, June 16, 2013 12:55:38 PM UTC+1, David Pilato wrote:
Do you want to display a Tag cloud?
You can use a Terms facet to get the top10 tags.

Is it what you are looking for?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 juin 2013 à 12:22, Chris Greening <[hidden email]> a écrit :

Hi All,

I've been thinking about how to index document tags - for example a user might add a tag to document to indicate what the document is about.

I'm assuming that the best way to index these tags would be to have an array of tags.

What I'd like to do though is have some way of weighting the tags - so if multiple people add the same tag it becomes more relevant than a tag that just a few people have added.

What's the best way of doing this?

Thanks
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.