Multi-word Term Vectors with Word nGrams?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Multi-word Term Vectors with Word nGrams?

Adam Toy
Hi all,

I'm aiming to build an index that, for each document, will break it down by word ngrams (uni, bi, and tri), then capture term vector analysis on all of those word ngrams. Is that possible with Elasticsearch?

For instance, for a document field containing "The car drives." I would be able to get:

the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23db3079-0475-4a63-bc77-e514bf087359%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-word Term Vectors with Word nGrams?

Adam Toy
Bump.

Also, need to correct my original example as 'The red car drives':

red
car
drives
red car
car drives
red car drives

On Friday, December 5, 2014 12:43:57 PM UTC-5, Adam Toy wrote:
Hi all,

I'm aiming to build an index that, for each document, will break it down by word ngrams (uni, bi, and tri), then capture term vector analysis on all of those word ngrams. Is that possible with Elasticsearch?

For instance, for a document field containing "The car drives." I would be able to get:

the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2841818-5996-4108-8468-c585c849c11b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Multi-word Term Vectors with Word nGrams?

Doug Turnbull
Use as part of a custom analyzer, (you probably want to lowercase as well)
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html

In Lucene-based search (ie Elasticsearch/Solr) "ngram" means character ngrams. Like "red" => "r", "re", "red". What most folks think of as "ngrams" Lucene calls "shingles".

Hope that helps
-Doug

On Tue, Dec 9, 2014 at 2:26 PM, Adam Toy <[hidden email]> wrote:
Bump.

Also, need to correct my original example as 'The red car drives':

red
car
drives
red car
car drives
red car drives

On Friday, December 5, 2014 12:43:57 PM UTC-5, Adam Toy wrote:
Hi all,

I'm aiming to build an index that, for each document, will break it down by word ngrams (uni, bi, and tri), then capture term vector analysis on all of those word ngrams. Is that possible with Elasticsearch?

For instance, for a document field containing "The car drives." I would be able to get:

the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2841818-5996-4108-8468-c585c849c11b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Doug Turnbull
Search & Big Data Architect

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL_dZVLiYNaRM%3DH844BFGbbATYweXCn-OE%2BL-ioo%3D3moOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.