> This actually has nothing to do with Lucene, but how elasticsearch handles

> deriving field types and handing "" text for numeric values.

>

> First, deriving a type for a field. If the field is first introduced, then

> the type is derived based on its value. This will not work well if the first

> document introducing nextDay will be an empty string, since the type for the

> field will be string, and not a number (long / double).

>

> As for empty text, then yes, it will fail to index the doc if an empty text

> is provided and its a numeric type. As you mentioned a null value for the

> field is what it handles, and does not handle empty text as null value.

>

> On Tue, Oct 18, 2011 at 9:29 PM, pulkitsinghal <

[hidden email]>wrote:

>

>

>

>

>

>

>

> > BTW, please forgive me in advance for even mentioning the word Solr in

> > this forum because I know ES folks cringe at comparisons between the

> > two technologies. I understand they are different and I am simply

> > making an analogy for the "Data Input & Indexing behavior" angle ...

> > so bear with me here.

>

> > The stacktrace from the ES server's NFE is at the end of this thread.

>

> > I have faced similar NumberFormatException issues before in Solr as

> > well. I think these happen simply because the underlying Lucene isn't

> > ready to accept/ignore an empty string for numbers or date/time data.

> > So I am assuming that this is no different for ES which is built atop

> > Lucene as well. (1) Let me know if you agree with me so far.

>

> > In Solr, I got around this by having its Data Import Handler run

> > scripts on the incoming documents to either place a number like -1 as

> > a placeholder or by removing the field explicitly from the document

> > construction.

>

> > So with ES, I was hoping it would be more straightforward. My feed in

> > ES is the magical and much revered CouchDB river :) And I try not to

> > define the mappings myself because ES does such a great job of

> > figuring them out and it is one of the many many many conveniences of

> > ES that I want to take advantage of.

>

> > I was hoping that ES would acknowledge the fact that letting empty

> > strings through (for core type fields like number, date and time) has

> > no merit and would simply ignore the empty values. (2) Is this a "bad"

> > thing to hope for?

>

> > The data that failed looks like:

> > "shipping" :

> > [

> > {

> > "nextDay" : "",

> > "vendorDelivery":69.99,

> > "ground" : "",

> > "secondDay":""

> > }

> > ]

> > So imagine my surprise at how well ES did, in order to be able to

> > guess that shipping.nextDay was supposed to be a number! But then not

> > ignoring the junk pumped into it as an empty string.

>

> > (2) I'm not bad mouthing ES, I'm asking: Can we expect ES to tackle

> > this or would we be wrong to place such an expectation on ES?

>

> > (3) If the data appropriately had a null value then ES would have

> > handled it already because when there is a (JSON) null value for the

> > field and the null_value has not been setup then ES defaults to not

> > adding the field at all. That is not the case here so what would the

> > workaround be? If any? Sanitize my data? Oh lord the tears are rolling

> > down my cheeks, please say that's not my only option.

>

> > Please let me know what you think.

>

> > === STACKTRACE ====

> > org.elasticsearch.index.mapper.MapperParsingException: Failed to parse

> > [shipping.nextDay]

> > at

>

> > org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:

> > 312)

> > at

>

> > org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:

> > 577)

> > at

> > org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:

> > 443)

> > at

>

> > org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:

> > 491)

> > at

>

> > org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:

> > 557)

> > at

> > org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:

> > 435)

> > at

> > org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:

> > 567)

> > at

> > org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:

> > 491)

> > at

>

> > org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:

> > 289)

> > at

>

> > org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:

> > 131)

> > at

>

> > org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

>

> > $AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:

> > 464)

> > at

>

> > org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

> > $AsyncShardOperationAction

> > $1.run(TransportShardReplicationOperationAction.java:377)

> > at java.util.concurrent.ThreadPoolExecutor

> > $Worker.runTask(ThreadPoolExecutor.java:886)

> > at java.util.concurrent.ThreadPoolExecutor

> > $Worker.run(ThreadPoolExecutor.java:908)

> > at java.lang.Thread.run(Thread.java:680)

>

> > Caused by: java.lang.NumberFormatException: empty String

> > at

> > sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:

> > 992)

> > at java.lang.Double.parseDouble(Double.java:510)

> > at

>

> > org.elasticsearch.common.xcontent.support.AbstractXContentParser.doubleValue(AbstractXContentParser.java:

> > 88)

> > at

>

> > org.elasticsearch.index.mapper.core.DoubleFieldMapper.parseCreateField(DoubleFieldMapper.java:

> > 227)

> > at

>

> > org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:

> > 299)

> > ... 14 more