Highlighting Boundary characters are not working in elastic search 1.7.1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Highlighting Boundary characters are not working in elastic search 1.7.1

Himanshu Arora
This post has NOT been accepted by the mailing list yet.
I am using elastic search and Java API version 1.7.1. I have a simple highlighting problem with boundary characters
Here I am setting the source content with

        XContentBuilder source = jsonBuilder().startObject();
        source.field(PROPERTY_BOOK_ID, bookId)
        .field(PROPERTY_CONTENT, parsedContent)
        .field("term_vector", "with_positions_offsets")
        .field(PROPERTY_FILENAME, file.getName())
        .field(PROPERTY_ATTACHMENT, Base64.encodeBase64String(FileUtils.readFileToByteArray(file)));

As per the documentation boundary characters work with term_vector", "with_positions_offsets

But When I query to the elastic search with boundary characters its giving me wrong response. Here is my Search Query with the search content "poem"

QueryBuilder query = boolQuery().must(QueryBuilders.textPhraseQuery(PROPERTY_BOOK_ID, bookId))
                         .must(QueryBuilders.queryStringQuery("*"+searchTerm+"*"));
       
        Map<String, Object> highlighterOptions = new HashMap<>();
        highlighterOptions.put("boundary_chars", "s.,!?\\t\\n\b");
       
        final SearchResponse response = searchClientService.getClient()
                        .prepareSearch(INDEX_NAME).setTypes(INDEX_TYPE)
            .setHighlighterQuery(query)
            .addHighlightedField(PROPERTY_CONTENT)
            .setHighlighterOptions(highlighterOptions)
                        .setExplain(true)
            .setSize(5000)
            .setFrom(0)
            .setHighlighterBoundaryMaxScan(10)
            .setHighlighterFragmentSize(50)
            .setHighlighterNumOfFragments(5000)
            .execute().actionGet();
Result :
0)English Literature poetry book, with poems from leading
1)you will

enjoy these poems during your GCSE
2)course and later in life.

Many of the poems deal
3). There are poems that will reflect your own ideas
4)you make the most of the poems and of your GCSE. It
5)in writing about and comparing poems for GCSE
6)you today.

Poems past and present – the AQA

Expected :
0)English Literature poetry book, with poems from leading
1)enjoy these poems during your GCSE
2)Many of the poems deal
3)There are poems that will reflect your own ideas
4)you make the most of the poems and of your GCSE.
5)in writing about and comparing poems for GCSE
6)Poems past and present – the AQA

Did I miss something in the query or while indexing the document ? Or did I misunderstand the boundary characters concept that the returned excerpts from elastic search returns above expected result ?

Thanks in advance
Loading...