I would like to be able to search parenthsis

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

I would like to be able to search parenthsis

Andy Bajka-2
I run a forum software called Xenforo and it uses ElasticSearch as a addon. It works great and I have enjoyed learning all about ES.

What I would like to be able to do is search messages that contain parentheses. For example a message will contain:

This is a picture of Andy (Andy).

So I would like to be able to search for (Andy) including the parenthesis.

In researching this, it looks like the only way to accomplish this is to create an analyzer as described here:

http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html

If I'm not mistaken would these be the steps to create what I would like to do?

1) Delete existing index
2) Run the analyzer script
3) Re-index my forum

Thank you kindly for your assistance.

Andy





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
When I do a _mapping I get the following information.

{
  "xenforo113" : {
    "post" : {
      "_source" : {
        "enabled" : false
      },
      "properties" : {
        "date" : {
          "type" : "long",
          "store" : "yes"
        },
        "discussion_id" : {
          "type" : "long",
          "store" : "yes"
        },
        "message" : {
          "type" : "string"
        },
        "node" : {
          "type" : "long"
        },
        "thread" : {
          "type" : "long"
        },
        "title" : {
          "type" : "string"
        },
        "user" : {
          "type" : "long",
          "store" : "yes"
        }
      }
    },

What exactly do I need to do to create a new index with the above mapping and a char map to
change the ( to an underscore. Or is there a better way that would index the parenthesis?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2


On Sunday, April 14, 2013 2:15:08 PM UTC-7, Andy Bajka wrote:
I run a forum software called Xenforo and it uses ElasticSearch as a addon. It works great and I have enjoyed learning all about ES.

What I would like to be able to do is search messages that contain parentheses. For example a message will contain:

This is a picture of Andy (Andy).

So I would like to be able to search for (Andy) including the parenthesis.

In researching this, it looks like the only way to accomplish this is to create an analyzer as described here:

http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html

If I'm not mistaken would these be the steps to create what I would like to do?

1) Delete existing index
2) Run the analyzer script
3) Re-index my forum

Thank you kindly for your assistance.

Andy





--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2
By the way the developer of Xenforo wrote the following when I asked how I can have parenthesis indexed:

That's getting into tokenizers and analysis: http://www.elasticsearch.org/guide/reference/index-modules/analysis/

So it look like I need to do several things in order to re-index in a way that duplicates what is already there but adds the char mapping.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2
Looks like I need to create an analyzer that uses the array type property.

http://www.elasticsearch.org/guide/reference/mapping/array-type/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2
Looking at the Xenforo code, I need to replicate this mapping.

    public static $optimizedGenericMapping = array(
        "_source" => array("enabled" => false),
        "properties" => array(
            "title" => array("type" => "string"),
            "message" => array("type" => "string"),
            "date" => array("type" => "long", "store" => "yes"),
            "user" => array("type" => "long", "store" => "yes"),
            "discussion_id" => array("type" => "long", "store" => "yes")
        )
    );

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2
I've taken a stab at creating my own analyzer mapping:

    "settings" : {
        "index" : {
            "number_of_shards" : 1,
            "number_of_replicas" : 1
        },
        "analysis" : {
            "filter" : {
                "tweet_filter" : {
                    "type" : "word_delimiter",
                    "type_table": ["( => ALPHA", ") => ALPHA"]
                }
            },
            "analyzer" : {
                "tweet_analyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["lowercase", "tweet_filter"]
                }
            }
        }
    },
    "mappings" : {
        "source" : {"enabled" : "false"},
            "properties" : {
                "title" : {"type" : "string"},
                "message" : {"type" : "string"},
             "date" : {"type" : "long", "store" : "yes"},
             "user" : {"type" : "long", "store" : "yes"},
             "discussion_id" : {"type" : "long", "store" : "yes"}
            }
        }
    }

Here is the _mapping which is not correct.

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
  "twitter" : {
    "source" : {
      "enabled" : false,
      "properties" : { }
    },
    "properties" : {
      "properties" : { }
    }
  }
}




--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2
Also it said I could not use the underscore in _source so I changed it to source.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2
I'm making progress. It's still not like the mapping of the Xenforo ElasticSearch, but getting closer:

{
  "twitter" : {
    "tweet" : {
      "properties" : {
        "date" : {
          "type" : "long",
          "store" : "yes"
        },
        "discussion_id" : {
          "type" : "long",
          "store" : "yes"
        },
        "message" : {
          "type" : "string",
          "analyzer" : "tweet_analyzer"
        },
        "title" : {
          "type" : "string"
        },
        "user" : {
          "type" : "long",
          "store" : "yes"
        }
      }
    }
  }

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2
This is a good sign, the filter works.

curl -XGET 'localhost:9200/twitter/_analyze?field=message&pretty=1' -d '(andy)'
{
  "tokens" : [ {
    "token" : "(andy)",
    "start_offset" : 0,
    "end_offset" : 6,
    "type" : "word",
    "position" : 1
  } ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
In reply to this post by Andy Bajka-2
I think I got it!!

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
  "twitter" : {
    "post" : {
      "_source" : {
        "enabled" : false
      },
      "properties" : {
        "date" : {
          "type" : "long",
          "store" : "yes"
        },
        "discussion_id" : {
          "type" : "long",
          "store" : "yes"
        },
        "message" : {
          "type" : "string",
          "analyzer" : "tweet_analyzer"
        },
        "title" : {
          "type" : "string"
        },
        "user" : {
          "type" : "long",
          "store" : "yes"
        }
      }
    }
  }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Ivan Brusic
Glad we can help you out. :)

You will get more flexibility by switching from whitespace tokenizer to a pattern tokenizer so that you can split on additional characters such as commas and periods in addition to whitespace.

-- 
Ivan


On Sun, Apr 14, 2013 at 6:59 PM, Andy Bajka <[hidden email]> wrote:
I think I got it!!
    "post" : {
      "_source" : {
        "enabled" : false
      },
      "properties" : {
        "date" : {
          "type" : "long",
          "store" : "yes"
        },
        "discussion_id" : {
          "type" : "long",
          "store" : "yes"
        },
        "message" : {
          "type" : "string",
          "analyzer" : "tweet_analyzer"
        },
        "title" : {
          "type" : "string"
        },
        "user" : {
          "type" : "long",
          "store" : "yes"
        }
      }
    }
  }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply | Threaded
Open this post in threaded view
|

Re: I would like to be able to search parenthsis

Andy Bajka-2
Hi Ivan,

Thank you for the suggestion. So far I'm pretty happy with the results that the whitespace tokenizer indexes. I think most of the data that we look for on my forum is the type that has white space around the word, so perhaps it's fine the way it is. I'll continue to monitor my results.

On Monday, April 15, 2013 8:16:35 AM UTC-7, Ivan Brusic wrote:
Glad we can help you out. :)

You will get more flexibility by switching from whitespace tokenizer to a pattern tokenizer so that you can split on additional characters such as commas and periods in addition to whitespace.

-- 
Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.