Ignore Hate temrs

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Ignore Hate temrs

hemantsingal
Is there a way to Ignore documents containing hate terms like fag**t, Ni**er etc from output of my search without having to specify them in each and every query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

Alexander Reelsen-2
Hey,

You can ignore any words at index time (note: they will still be in the source of your document, if a search hit is returned).


On Thu, Mar 14, 2013 at 8:22 AM, cavebird <[hidden email]> wrote:
Is there a way to Ignore documents containing hate terms like fag**t, Ni**er etc from output of my search without having to specify them in each and every query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

hemantsingal
Not indexing them is good but I really can't show these documents so I still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:
Hey,

You can ignore any words at index time (note: they will still be in the source of your document, if a search hit is returned).


On Thu, Mar 14, 2013 at 8:22 AM, cavebird <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="Bnu1IF5qYO8J">hemant...@...> wrote:
Is there a way to Ignore documents containing hate terms like fag**t, Ni**er etc from output of my search without having to specify them in each and every query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="Bnu1IF5qYO8J">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

David G Ortega
Hi, would you like to post it in StackOverflow? Personally I would prefer to answer there

On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:
Not indexing them is good but I really can't show these documents so I still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:
Hey,

You can ignore any words at index time (note: they will still be in the source of your document, if a search hit is returned).


On Thu, Mar 14, 2013 at 8:22 AM, cavebird <[hidden email]> wrote:
Is there a way to Ignore documents containing hate terms like fag**t, Ni**er etc from output of my search without having to specify them in each and every query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

vineeth mohan
Why dont you make sure that such documents are not indexed. Or at-least periodically run a delete by query on all the documents which needs to be black listed.

Thanks
           Vineeth

On Thu, Mar 14, 2013 at 3:35 PM, David G Ortega <[hidden email]> wrote:
Hi, would you like to post it in StackOverflow? Personally I would prefer to answer there


On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:
Not indexing them is good but I really can't show these documents so I still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:
Hey,

You can ignore any words at index time (note: they will still be in the source of your document, if a search hit is returned).


On Thu, Mar 14, 2013 at 8:22 AM, cavebird <[hidden email]> wrote:
Is there a way to Ignore documents containing hate terms like fag**t, Ni**er etc from output of my search without having to specify them in each and every query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

hemantsingal
Why dont you make sure that such documents are not indexed.
- How do I do that?

Or at-least periodically run a delete by query on all the documents which needs to be black listed.
- Deletes are expensive and not real time.

On Thursday, March 14, 2013 3:52:43 PM UTC+5:30, Vineeth Mohan wrote:
Why dont you make sure that such documents are not indexed. Or at-least periodically run a delete by query on all the documents which needs to be black listed.

Thanks
           Vineeth

On Thu, Mar 14, 2013 at 3:35 PM, David G Ortega <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="Xwzb6i00lzAJ">g.orteg...@...> wrote:
Hi, would you like to post it in StackOverflow? Personally I would prefer to answer there


On Thursday, March 14, 2013 8:51:13 AM UTC+1, cavebird wrote:
Not indexing them is good but I really can't show these documents so I still have to put these terms in every query.

On Thursday, March 14, 2013 1:11:47 PM UTC+5:30, Alexander Reelsen wrote:
Hey,

You can ignore any words at index time (note: they will still be in the source of your document, if a search hit is returned).


On Thu, Mar 14, 2013 at 8:22 AM, cavebird <[hidden email]> wrote:
Is there a way to Ignore documents containing hate terms like fag**t, Ni**er etc from output of my search without having to specify them in each and every query.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="Xwzb6i00lzAJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

Clinton Gormley-2
On Thu, 2013-03-14 at 03:26 -0700, cavebird wrote:
> Why dont you make sure that such documents are not indexed.
> - How do I do that?

Check them in your application before you index them

>        

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

hemantsingal
Well the data comes into system from multiple entry points and through different stacks as well (RoR, Java). 
Also, I would like the ability to modify terms list in the future. 

On Thursday, March 14, 2013 4:59:22 PM UTC+5:30, Clinton Gormley wrote:
On Thu, 2013-03-14 at 03:26 -0700, cavebird wrote:
> Why dont you make sure that such documents are not indexed.
> - How do I do that?

Check them in your application before you index them

>        

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

David G Ortega
In reply to this post by hemantsingal
Just in case you wanna have a 100% ES solution or just in case you want to have all your data available
here you have a possible solution:

1) Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

2) Set your field or fields as multi_field setting isBad with your isBad analyzer

"myTextField"
"type" : "multi_field", 
"fields"
{  
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

3) Search filtering 

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { } 
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

hemantsingal
I will definitely try this and get back to you. :)

On Thursday, March 14, 2013 5:17:31 PM UTC+5:30, David G Ortega wrote:
Just in case you wanna have a 100% ES solution or just in case you want to have all your data available
here you have a possible solution:

1) Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

2) Set your field or fields as multi_field setting isBad with your isBad analyzer

"myTextField"
"type" : "multi_field", 
"fields"
{  
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

3) Search filtering 

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { } 
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

vineeth mohan
I am not getting the whole concept here.
For this to work , shouldnt  i write the text both to myTextField and isBad field ?

Thanks
           Vineeth

On Thu, Mar 14, 2013 at 5:22 PM, cavebird <[hidden email]> wrote:
I will definitely try this and get back to you. :)


On Thursday, March 14, 2013 5:17:31 PM UTC+5:30, David G Ortega wrote:
Just in case you wanna have a 100% ES solution or just in case you want to have all your data available
here you have a possible solution:

1) Create an analyzer that transforms all your words into the same token

{
"index" : {
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["badword1=>bad", "badword2=>bad"]
}
},
"analyzer" : {
"isBad" : {
"tokenizer" : "standard",
"filter" : ["lowercase", "asciifolding", "unique"]
"char_filter" : ["my_mapping"]
},
}
}
}
}

2) Set your field or fields as multi_field setting isBad with your isBad analyzer

"myTextField"
"type" : "multi_field", 
"fields"
{  
"myTextField" : { "type" : "string"},
"isBad" : { "type" : "string", "index_analyzer" : "isBad"},
}
}

3) Search filtering 

{
"from" : 0,
"size" : 10,
"query" :
{
"filtered" :
{
"query" :
{
"match_all" : { } 
},
"filter" :
{
"not" : { {"term" : { "myTextField.isBad" : "bad" } } }
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

David G Ortega
When you set a multi_field and you send a document with that field name ES internally creates the multi_field using the mapping deffinition. What is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like tagFlagged instead of "bad" like in the mapping so in another example with this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3  text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in text.isBad, no posts flagged are going to appear


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

Michael Sick
Also you should consider the impacts of false positives on your system. Take the following phrase from The Hobbit - "The faggots are reeking". Perhaps the elves are homophobic but research shows that they are just admiring burning wood. 
  


Since analysis for context and sentiment is difficult, you might setup a system for review where the words that you are trying to exclude change a state filed, something like: censorStatus=ok,review,notOk so that on most reads you only retrieve the "ok" value and some stewards review the posts that require it and either allow or disallow. Without knowing the context of your system, not sure how likely it is that you need to care but if you do you'll find that being "smart" about the exclusions can be a pain.


On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega <[hidden email]> wrote:
When you set a multi_field and you send a document with that field name ES internally creates the multi_field using the mapping deffinition. What is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like tagFlagged instead of "bad" like in the mapping so in another example with this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3  text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in text.isBad, no posts flagged are going to appear


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

mrflip
You may enjoy(?) the lists of obscene words I've gathered here: http://www.infochimps.com/datasets/list-of-dirty-obscene-banned-and-otherwise-unacceptable-words

I believe this kind of thing -- for which regexps and lookup tables fall short, yet for which proper NLP is too much work -- is perfect for the percolate feature.

As you index each document, percolate against rule sets as complex or simple-term-matchy as you like, and tag documents with a "probably offensive" flag. Now exclude such altogether, or let visitors opt in/out to flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick <[hidden email]> wrote:

Also you should consider the impacts of false positives on your system. Take the following phrase from The Hobbit - "The faggots are reeking". Perhaps the elves are homophobic but research shows that they are just admiring burning wood. 
  


Since analysis for context and sentiment is difficult, you might setup a system for review where the words that you are trying to exclude change a state filed, something like: censorStatus=ok,review,notOk so that on most reads you only retrieve the "ok" value and some stewards review the posts that require it and either allow or disallow. Without knowing the context of your system, not sure how likely it is that you need to care but if you do you'll find that being "smart" about the exclusions can be a pain.


On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega <[hidden email]> wrote:
When you set a multi_field and you send a document with that field name ES internally creates the multi_field using the mapping deffinition. What is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like tagFlagged instead of "bad" like in the mapping so in another example with this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3  text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in text.isBad, no posts flagged are going to appear


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

David G Ortega
I think that we both do the same since we both flag the doc but I love your solution Flip. Brillaint.

On Thursday, March 14, 2013 3:13:23 PM UTC+1, Philip (Flip) Kromer wrote:
You may enjoy(?) the lists of obscene words I've gathered here: http://www.infochimps.com/datasets/list-of-dirty-obscene-banned-and-otherwise-unacceptable-words

I believe this kind of thing -- for which regexps and lookup tables fall short, yet for which proper NLP is too much work -- is perfect for the percolate feature.

As you index each document, percolate against rule sets as complex or simple-term-matchy as you like, and tag documents with a "probably offensive" flag. Now exclude such altogether, or let visitors opt in/out to flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="iBG2Qlsp_ccJ">michae...@serenesoftware.com> wrote:

Also you should consider the impacts of false positives on your system. Take the following phrase from The Hobbit - "The faggots are reeking". Perhaps the elves are homophobic but research shows that they are just admiring burning wood. 
  


Since analysis for context and sentiment is difficult, you might setup a system for review where the words that you are trying to exclude change a state filed, something like: censorStatus=ok,review,notOk so that on most reads you only retrieve the "ok" value and some stewards review the posts that require it and either allow or disallow. Without knowing the context of your system, not sure how likely it is that you need to care but if you do you'll find that being "smart" about the exclusions can be a pain.


On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="iBG2Qlsp_ccJ">g.orteg...@...> wrote:
When you set a multi_field and you send a document with that field name ES internally creates the multi_field using the mapping deffinition. What is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like tagFlagged instead of "bad" like in the mapping so in another example with this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3  text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in text.isBad, no posts flagged are going to appear


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="iBG2Qlsp_ccJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="iBG2Qlsp_ccJ">elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

vineeth mohan
Thanks David ,
This is a handy piece of knowledge.

Thanks
            Vineeth

On Thu, Mar 14, 2013 at 7:53 PM, David G Ortega <[hidden email]> wrote:
I think that we both do the same since we both flag the doc but I love your solution Flip. Brillaint.

On Thursday, March 14, 2013 3:13:23 PM UTC+1, Philip (Flip) Kromer wrote:
You may enjoy(?) the lists of obscene words I've gathered here: http://www.infochimps.com/datasets/list-of-dirty-obscene-banned-and-otherwise-unacceptable-words

I believe this kind of thing -- for which regexps and lookup tables fall short, yet for which proper NLP is too much work -- is perfect for the percolate feature.

As you index each document, percolate against rule sets as complex or simple-term-matchy as you like, and tag documents with a "probably offensive" flag. Now exclude such altogether, or let visitors opt in/out to flagged documents.

Flip

Sent from my iPad

On Mar 14, 2013, at 9:32 AM, Michael Sick <michae...@serenesoftware.com> wrote:

Also you should consider the impacts of false positives on your system. Take the following phrase from The Hobbit - "The faggots are reeking". Perhaps the elves are homophobic but research shows that they are just admiring burning wood. 
  


Since analysis for context and sentiment is difficult, you might setup a system for review where the words that you are trying to exclude change a state filed, something like: censorStatus=ok,review,notOk so that on most reads you only retrieve the "ok" value and some stewards review the posts that require it and either allow or disallow. Without knowing the context of your system, not sure how likely it is that you need to care but if you do you'll find that being "smart" about the exclusions can be a pain.


On Thu, Mar 14, 2013 at 9:21 AM, David G Ortega <[hidden email]> wrote:
When you set a multi_field and you send a document with that field name ES internally creates the multi_field using the mapping deffinition. What is going to happen is this:

You send:
{text: "this is a badword1 text"}

In ES:
{text.text: [this, is, a, badword1, text]}
{text.isBad: [this, is, a, bad, text]}

Oviously "bad" is so much generic word, is better to have something like tagFlagged instead of "bad" like in the mapping so in another example with this tagFlagged this is going to happen

You send:
{text: "this is a badword1, badword2, badword3  text"}

In ES:
{text.text: [this, is, a, badword1, badword2, badword3, text]}
{text.isBad: [this, is, a, tagFlagged, text]} (lowercase, unique)

since you are filtering in the search to not have the term tagFlagged in text.isBad, no posts flagged are going to appear


--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

David G Ortega
You are welcome Vineeth :)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Ignore Hate temrs

David Zachariah
This post has NOT been accepted by the mailing list yet.
Hi David,

Thanks for the useful information. For the censored words, if I have a huge lists of offensive words, do I need to list all of them in the "term" :{"badw1"=>"bad", "badw2"=>"bad", ..., "badw9999"=>"bad"}, or is there another way doing this tedious task - as a file?

Second what is the performance of percolate? Is it acceptable to use it as sentiment analysis?

Thanks,

David