fuzzy matching and direct hit ranking

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

fuzzy matching and direct hit ranking

Woody Peterson
I'm trying to implement fuzzy matching, and wanted to get a sanity check, as I'm hitting what was initially a surprising use case.

Let's say I have an index with documents: [{'name' => 'Coleman'}, {'name' => 'Boleman'}]. Doing a fuzzy_like_this search for 'coleman' will non-deterministically return either document first, whereas before I read up on the details of fuzzy search I would have expected the results to have taken distance into account.

After reading some of the documentation and relevant posts on this forum, I understand that what it's doing is expanding the search to all terms in the index within a percentage-wise distance of the word. So in the above example, my current understanding is that a search for 'coleman' is literally the same thing as searching for 'coleman' and 'boleman'.

First of all, is this correct? Second, is how do I achieve my desired behavior?

My first thought is to do a dis_max query with both a text and fuzzy_like_this. Would anyone pursue a different strategy instead?

Thanks!

-Woody
Reply | Threaded
Open this post in threaded view
|

RE: fuzzy matching and direct hit ranking

rpsandiford
This post has NOT been accepted by the mailing list yet.

You might want to check that you are doing a lower case at indexing time.

 

Because – the distance between “Coleman” and “coleman” is the same as the distance between “Boleman” and “coleman” – i.e. both require a one-character replacement.

 

Bob Sandiford | Lead Software Engineer SirsiDynix

P: 800.288.8020 X6943 | [hidden email]

www.sirsidynix.com

 

Join the conversation: Like us on Facebook! Follow us on Twitter!

 

From: Woody Peterson [via ElasticSearch Users] [mailto:[hidden email]]
Sent: Tuesday, May 15, 2012 2:23 PM
To: Bob Sandiford
Subject: fuzzy matching and direct hit ranking

 

I'm trying to implement fuzzy matching, and wanted to get a sanity check, as I'm hitting what was initially a surprising use case.

 

Let's say I have an index with documents: [{'name' => 'Coleman'}, {'name' => 'Boleman'}]. Doing a fuzzy_like_this search for 'coleman' will non-deterministically return either document first, whereas before I read up on the details of fuzzy search I would have expected the results to have taken distance into account.

 

After reading some of the documentation and relevant posts on this forum, I understand that what it's doing is expanding the search to all terms in the index within a percentage-wise distance of the word. So in the above example, my current understanding is that a search for 'coleman' is literally the same thing as searching for 'coleman' and 'boleman'.

 

First of all, is this correct? Second, is how do I achieve my desired behavior?

 

My first thought is to do a dis_max query with both a text and fuzzy_like_this. Would anyone pursue a different strategy instead?

 

Thanks!

 

-Woody

 


If you reply to this email, your message will be added to the discussion below:

http://elasticsearch-users.115913.n3.nabble.com/fuzzy-matching-and-direct-hit-ranking-tp3988602.html

To start a new topic under ElasticSearch Users, email [hidden email]
To unsubscribe from ElasticSearch Users, click here.
NAML

Bob.
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy matching and direct hit ranking

Bob Sandiford
In reply to this post by Woody Peterson
You might want to check that you are doing a lower case at indexing
time.

Because – the distance between “Coleman” and “coleman” is the same as
the distance between “Boleman” and “coleman” – i.e. both require a one-
character replacement.

Bob.
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy matching and direct hit ranking

Woody Peterson
> the distance between “Coleman” and “coleman” is the same as the distance between “Boleman” and “coleman”

I tried a ton of stuff yesterday, and that might explain differences I was seeing between fuzzy_like_this vs 'text' searches w/ a fuziness ('text' searches go through normal tokenization, while fuzzy_like_this apparently doesn't). I think if your field uses the standard analyzer, it should be lowercased on index, so you would just have to lowercase it on search for fuzzy_like_this or use 'text', no?

I eventually settled on the following, although it still exhibits this edge case bug:

curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'

Now do the following, replacing 'Coleman' with 'Boleman', 'coleman', and 'boleman':

curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{"query":{"text":{"_all":{"query":"Coleman","fuzziness":0.8}}},"explain":true}'

I get a score of 0.30685282 for both documents in all search variations. The order is *not* indeterminate as I stated in my first email, this is only seen in some cases of my particular dataset because every once in a while it will hit a document with some tiny additional factor, such as a slightly different fieldNorm, that will boost one doc over the other and appear to the user as nonsensical. This test case, however, is consistent, although it isn't clear to me what secondary sorting is going on to keep 'Coleman' always ahead of 'Boleman'; I'm assuming insertion order.

-Woody

On Wednesday, May 16, 2012 7:58:29 AM UTC-7, Bob Sandiford wrote:
You might want to check that you are doing a lower case at indexing
time.

Because – the distance between “Coleman” and “coleman” is the same as
the distance between “Boleman” and “coleman” – i.e. both require a one-
character replacement.

Bob.
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy matching and direct hit ranking

Woody Peterson

> curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
> curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'
> ... I get a score of 0.30685282 for both documents in all search variations

I also meant to point out that I get exactly the same results (same score and everything) when doing '{"name":"coleman"}' and '{"name":"boleman"}', also. Which I would predict, but is nice to verify.

On Wednesday, May 16, 2012 9:46:25 AM UTC-7, Woody Peterson wrote:
> the distance between “Coleman” and “coleman” is the same as the distance between “Boleman” and “coleman”

I tried a ton of stuff yesterday, and that might explain differences I was seeing between fuzzy_like_this vs 'text' searches w/ a fuziness ('text' searches go through normal tokenization, while fuzzy_like_this apparently doesn't). I think if your field uses the standard analyzer, it should be lowercased on index, so you would just have to lowercase it on search for fuzzy_like_this or use 'text', no?

I eventually settled on the following, although it still exhibits this edge case bug:

curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'

Now do the following, replacing 'Coleman' with 'Boleman', 'coleman', and 'boleman':

curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{"query":{"text":{"_all":{"query":"Coleman","fuzziness":0.8}}},"explain":true}'

I get a score of 0.30685282 for both documents in all search variations. The order is *not* indeterminate as I stated in my first email, this is only seen in some cases of my particular dataset because every once in a while it will hit a document with some tiny additional factor, such as a slightly different fieldNorm, that will boost one doc over the other and appear to the user as nonsensical. This test case, however, is consistent, although it isn't clear to me what secondary sorting is going on to keep 'Coleman' always ahead of 'Boleman'; I'm assuming insertion order.

-Woody

On Wednesday, May 16, 2012 7:58:29 AM UTC-7, Bob Sandiford wrote:
You might want to check that you are doing a lower case at indexing
time.

Because – the distance between “Coleman” and “coleman” is the same as
the distance between “Boleman” and “coleman” – i.e. both require a one-
character replacement.

Bob.
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy matching and direct hit ranking

Woody Peterson
Based on Bob's reaction, I would say at least one other person would expect elasticsearch to rank direct hits above close fuzzy hits. If this is not the case, is it a bug, or am I doing it wrong?

On Wednesday, May 16, 2012 9:52:13 AM UTC-7, Woody Peterson wrote:

> curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
> curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'
> ... I get a score of 0.30685282 for both documents in all search variations

I also meant to point out that I get exactly the same results (same score and everything) when doing '{"name":"coleman"}' and '{"name":"boleman"}', also. Which I would predict, but is nice to verify.

On Wednesday, May 16, 2012 9:46:25 AM UTC-7, Woody Peterson wrote:
> the distance between “Coleman” and “coleman” is the same as the distance between “Boleman” and “coleman”

I tried a ton of stuff yesterday, and that might explain differences I was seeing between fuzzy_like_this vs 'text' searches w/ a fuziness ('text' searches go through normal tokenization, while fuzzy_like_this apparently doesn't). I think if your field uses the standard analyzer, it should be lowercased on index, so you would just have to lowercase it on search for fuzzy_like_this or use 'text', no?

I eventually settled on the following, although it still exhibits this edge case bug:

curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'

Now do the following, replacing 'Coleman' with 'Boleman', 'coleman', and 'boleman':

curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{"query":{"text":{"_all":{"query":"Coleman","fuzziness":0.8}}},"explain":true}'

I get a score of 0.30685282 for both documents in all search variations. The order is *not* indeterminate as I stated in my first email, this is only seen in some cases of my particular dataset because every once in a while it will hit a document with some tiny additional factor, such as a slightly different fieldNorm, that will boost one doc over the other and appear to the user as nonsensical. This test case, however, is consistent, although it isn't clear to me what secondary sorting is going on to keep 'Coleman' always ahead of 'Boleman'; I'm assuming insertion order.

-Woody

On Wednesday, May 16, 2012 7:58:29 AM UTC-7, Bob Sandiford wrote:
You might want to check that you are doing a lower case at indexing
time.

Because – the distance between “Coleman” and “coleman” is the same as
the distance between “Boleman” and “coleman” – i.e. both require a one-
character replacement.

Bob.
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy matching and direct hit ranking

kimchy
Administrator
Yes, you will need to do text query and fuzzy and rank the text one higher...

On Thu, May 17, 2012 at 8:02 PM, Woody Peterson <[hidden email]> wrote:
Based on Bob's reaction, I would say at least one other person would expect elasticsearch to rank direct hits above close fuzzy hits. If this is not the case, is it a bug, or am I doing it wrong?


On Wednesday, May 16, 2012 9:52:13 AM UTC-7, Woody Peterson wrote:

> curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
> curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'
> ... I get a score of 0.30685282 for both documents in all search variations

I also meant to point out that I get exactly the same results (same score and everything) when doing '{"name":"coleman"}' and '{"name":"boleman"}', also. Which I would predict, but is nice to verify.

On Wednesday, May 16, 2012 9:46:25 AM UTC-7, Woody Peterson wrote:
> the distance between “Coleman” and “coleman” is the same as the distance between “Boleman” and “coleman”

I tried a ton of stuff yesterday, and that might explain differences I was seeing between fuzzy_like_this vs 'text' searches w/ a fuziness ('text' searches go through normal tokenization, while fuzzy_like_this apparently doesn't). I think if your field uses the standard analyzer, it should be lowercased on index, so you would just have to lowercase it on search for fuzzy_like_this or use 'text', no?

I eventually settled on the following, although it still exhibits this edge case bug:

curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'

Now do the following, replacing 'Coleman' with 'Boleman', 'coleman', and 'boleman':

curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{"query":{"text":{"_all":{"query":"Coleman","fuzziness":0.8}}},"explain":true}'

I get a score of 0.30685282 for both documents in all search variations. The order is *not* indeterminate as I stated in my first email, this is only seen in some cases of my particular dataset because every once in a while it will hit a document with some tiny additional factor, such as a slightly different fieldNorm, that will boost one doc over the other and appear to the user as nonsensical. This test case, however, is consistent, although it isn't clear to me what secondary sorting is going on to keep 'Coleman' always ahead of 'Boleman'; I'm assuming insertion order.

-Woody

On Wednesday, May 16, 2012 7:58:29 AM UTC-7, Bob Sandiford wrote:
You might want to check that you are doing a lower case at indexing
time.

Because – the distance between “Coleman” and “coleman” is the same as
the distance between “Boleman” and “coleman” – i.e. both require a one-
character replacement.

Bob.

Reply | Threaded
Open this post in threaded view
|

Re: fuzzy matching and direct hit ranking

Woody Peterson
This thread http://www.gossamer-threads.com/lists/lucene/java-user/144162 has the following comment that suggests it is a lucene option which strategy to take:

> it should score "farming" higher than "farmin" by 

> default, but the default rewrite mode also takes TF/IDF into account (in 
> addition). You can change that by a different rewrite method: 

> The default is: http://goo.gl/JhHOA (which combines the standard vector 
> model with additionally boosting exact matches - we have that for backwards 
> compatibility only, its not what most users expect) 

> The better one is: http://goo.gl/0eJ47, which does not take TF/IDF into 
> account and only boosts by levensthein distance. 

> You can disable fuzzy boosting altogether: 
> Additionally http://goo.gl/VWlkW provides two other scoring models (TF/IDF 
> only, no boosting - or constant score at all) 

Would it make sense to expose these options for use in elasticsearch?

On Sunday, May 20, 2012 1:05:04 PM UTC-7, kimchy wrote:
Yes, you will need to do text query and fuzzy and rank the text one higher...

On Thu, May 17, 2012 at 8:02 PM, Woody Peterson wrote:
Based on Bob's reaction, I would say at least one other person would expect elasticsearch to rank direct hits above close fuzzy hits. If this is not the case, is it a bug, or am I doing it wrong?


On Wednesday, May 16, 2012 9:52:13 AM UTC-7, Woody Peterson wrote:

> curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
> curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'
> ... I get a score of 0.30685282 for both documents in all search variations

I also meant to point out that I get exactly the same results (same score and everything) when doing '{"name":"coleman"}' and '{"name":"boleman"}', also. Which I would predict, but is nice to verify.

On Wednesday, May 16, 2012 9:46:25 AM UTC-7, Woody Peterson wrote:
> the distance between “Coleman” and “coleman” is the same as the distance between “Boleman” and “coleman”

I tried a ton of stuff yesterday, and that might explain differences I was seeing between fuzzy_like_this vs 'text' searches w/ a fuziness ('text' searches go through normal tokenization, while fuzzy_like_this apparently doesn't). I think if your field uses the standard analyzer, it should be lowercased on index, so you would just have to lowercase it on search for fuzzy_like_this or use 'text', no?

I eventually settled on the following, although it still exhibits this edge case bug:

curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'

Now do the following, replacing 'Coleman' with 'Boleman', 'coleman', and 'boleman':

curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{"query":{"text":{"_all":{"query":"Coleman","fuzziness":0.8}}},"explain":true}'

I get a score of 0.30685282 for both documents in all search variations. The order is *not* indeterminate as I stated in my first email, this is only seen in some cases of my particular dataset because every once in a while it will hit a document with some tiny additional factor, such as a slightly different fieldNorm, that will boost one doc over the other and appear to the user as nonsensical. This test case, however, is consistent, although it isn't clear to me what secondary sorting is going on to keep 'Coleman' always ahead of 'Boleman'; I'm assuming insertion order.

-Woody

On Wednesday, May 16, 2012 7:58:29 AM UTC-7, Bob Sandiford wrote:
You might want to check that you are doing a lower case at indexing
time.

Because – the distance between “Coleman” and “coleman” is the same as
the distance between “Boleman” and “coleman” – i.e. both require a one-
character replacement.

Bob.

Reply | Threaded
Open this post in threaded view
|

Re: fuzzy matching and direct hit ranking

kimchy
Administrator
Sadly, the Lucene query parser does not expose the ability to set the rewrite method for fuzzy query, but we can work around that. You can set the rewrite method for the query_string, but it applies to wildcard and prefix queries. We can add a fuzzy rewrite method option as well here...

On Mon, May 21, 2012 at 6:30 PM, Woody Peterson <[hidden email]> wrote:
This thread http://www.gossamer-threads.com/lists/lucene/java-user/144162 has the following comment that suggests it is a lucene option which strategy to take:

> it should score "farming" higher than "farmin" by 

> default, but the default rewrite mode also takes TF/IDF into account (in 
> addition). You can change that by a different rewrite method: 

> The default is: http://goo.gl/JhHOA (which combines the standard vector 
> model with additionally boosting exact matches - we have that for backwards 
> compatibility only, its not what most users expect) 

> The better one is: http://goo.gl/0eJ47, which does not take TF/IDF into 
> account and only boosts by levensthein distance. 

> You can disable fuzzy boosting altogether: 
> Additionally http://goo.gl/VWlkW provides two other scoring models (TF/IDF 
> only, no boosting - or constant score at all) 

Would it make sense to expose these options for use in elasticsearch?

On Sunday, May 20, 2012 1:05:04 PM UTC-7, kimchy wrote:
Yes, you will need to do text query and fuzzy and rank the text one higher...

On Thu, May 17, 2012 at 8:02 PM, Woody Peterson wrote:

Based on Bob's reaction, I would say at least one other person would expect elasticsearch to rank direct hits above close fuzzy hits. If this is not the case, is it a bug, or am I doing it wrong?


On Wednesday, May 16, 2012 9:52:13 AM UTC-7, Woody Peterson wrote:

> curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
> curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'
> ... I get a score of 0.30685282 for both documents in all search variations

I also meant to point out that I get exactly the same results (same score and everything) when doing '{"name":"coleman"}' and '{"name":"boleman"}', also. Which I would predict, but is nice to verify.

On Wednesday, May 16, 2012 9:46:25 AM UTC-7, Woody Peterson wrote:
> the distance between “Coleman” and “coleman” is the same as the distance between “Boleman” and “coleman”

I tried a ton of stuff yesterday, and that might explain differences I was seeing between fuzzy_like_this vs 'text' searches w/ a fuziness ('text' searches go through normal tokenization, while fuzzy_like_this apparently doesn't). I think if your field uses the standard analyzer, it should be lowercased on index, so you would just have to lowercase it on search for fuzzy_like_this or use 'text', no?

I eventually settled on the following, although it still exhibits this edge case bug:

curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'

Now do the following, replacing 'Coleman' with 'Boleman', 'coleman', and 'boleman':

curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{"query":{"text":{"_all":{"query":"Coleman","fuzziness":0.8}}},"explain":true}'

I get a score of 0.30685282 for both documents in all search variations. The order is *not* indeterminate as I stated in my first email, this is only seen in some cases of my particular dataset because every once in a while it will hit a document with some tiny additional factor, such as a slightly different fieldNorm, that will boost one doc over the other and appear to the user as nonsensical. This test case, however, is consistent, although it isn't clear to me what secondary sorting is going on to keep 'Coleman' always ahead of 'Boleman'; I'm assuming insertion order.

-Woody

On Wednesday, May 16, 2012 7:58:29 AM UTC-7, Bob Sandiford wrote:
You might want to check that you are doing a lower case at indexing
time.

Because – the distance between “Coleman” and “coleman” is the same as
the distance between “Boleman” and “coleman” – i.e. both require a one-
character replacement.

Bob.


Reply | Threaded
Open this post in threaded view
|

Re: fuzzy matching and direct hit ranking

kimchy
Administrator
Had another look, and it seems like we can hook specific fuzzy settings to the Lucene query parser and in other places, opened an issue: https://github.com/elasticsearch/elasticsearch/issues/1974.

On Wed, May 23, 2012 at 11:57 PM, Shay Banon <[hidden email]> wrote:
Sadly, the Lucene query parser does not expose the ability to set the rewrite method for fuzzy query, but we can work around that. You can set the rewrite method for the query_string, but it applies to wildcard and prefix queries. We can add a fuzzy rewrite method option as well here...


On Mon, May 21, 2012 at 6:30 PM, Woody Peterson <[hidden email]> wrote:
This thread http://www.gossamer-threads.com/lists/lucene/java-user/144162 has the following comment that suggests it is a lucene option which strategy to take:

> it should score "farming" higher than "farmin" by 

> default, but the default rewrite mode also takes TF/IDF into account (in 
> addition). You can change that by a different rewrite method: 

> The default is: http://goo.gl/JhHOA (which combines the standard vector 
> model with additionally boosting exact matches - we have that for backwards 
> compatibility only, its not what most users expect) 

> The better one is: http://goo.gl/0eJ47, which does not take TF/IDF into 
> account and only boosts by levensthein distance. 

> You can disable fuzzy boosting altogether: 
> Additionally http://goo.gl/VWlkW provides two other scoring models (TF/IDF 
> only, no boosting - or constant score at all) 

Would it make sense to expose these options for use in elasticsearch?

On Sunday, May 20, 2012 1:05:04 PM UTC-7, kimchy wrote:
Yes, you will need to do text query and fuzzy and rank the text one higher...

On Thu, May 17, 2012 at 8:02 PM, Woody Peterson wrote:

Based on Bob's reaction, I would say at least one other person would expect elasticsearch to rank direct hits above close fuzzy hits. If this is not the case, is it a bug, or am I doing it wrong?


On Wednesday, May 16, 2012 9:52:13 AM UTC-7, Woody Peterson wrote:

> curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
> curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'
> ... I get a score of 0.30685282 for both documents in all search variations

I also meant to point out that I get exactly the same results (same score and everything) when doing '{"name":"coleman"}' and '{"name":"boleman"}', also. Which I would predict, but is nice to verify.

On Wednesday, May 16, 2012 9:46:25 AM UTC-7, Woody Peterson wrote:
> the distance between “Coleman” and “coleman” is the same as the distance between “Boleman” and “coleman”

I tried a ton of stuff yesterday, and that might explain differences I was seeing between fuzzy_like_this vs 'text' searches w/ a fuziness ('text' searches go through normal tokenization, while fuzzy_like_this apparently doesn't). I think if your field uses the standard analyzer, it should be lowercased on index, so you would just have to lowercase it on search for fuzzy_like_this or use 'text', no?

I eventually settled on the following, although it still exhibits this edge case bug:

curl -XPUT 'http://localhost:9200/test/user/1' -d '{"name":"Coleman"}'
curl -XPUT 'http://localhost:9200/test/user/2' -d '{"name":"Boleman"}'

Now do the following, replacing 'Coleman' with 'Boleman', 'coleman', and 'boleman':

curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{"query":{"text":{"_all":{"query":"Coleman","fuzziness":0.8}}},"explain":true}'

I get a score of 0.30685282 for both documents in all search variations. The order is *not* indeterminate as I stated in my first email, this is only seen in some cases of my particular dataset because every once in a while it will hit a document with some tiny additional factor, such as a slightly different fieldNorm, that will boost one doc over the other and appear to the user as nonsensical. This test case, however, is consistent, although it isn't clear to me what secondary sorting is going on to keep 'Coleman' always ahead of 'Boleman'; I'm assuming insertion order.

-Woody

On Wednesday, May 16, 2012 7:58:29 AM UTC-7, Bob Sandiford wrote:
You might want to check that you are doing a lower case at indexing
time.

Because – the distance between “Coleman” and “coleman” is the same as
the distance between “Boleman” and “coleman” – i.e. both require a one-
character replacement.

Bob.