Lucene 4.0

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Lucene 4.0

kimchy
Administrator
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--


Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Matt Weber-2
Awesome, I can't wait!  

On Thu, Oct 11, 2012 at 8:18 AM, Shay Banon <[hidden email]> wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--



--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Eugene Strokin
In reply to this post by kimchy
Shay, thanks for update.
Just wander about back compatibility.
If switching from 0.19 to 0.20 will it require reindexing, or we could just update the libs?
And from 20 to 21, I guess it will require reindexing, but want to hear confirmation from you.

Thank you


On Thursday, October 11, 2012 11:18:46 AM UTC-4, kimchy wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Ivan Brusic
Lucene is always backwards compatible within one version, so Lucene 4.0 code should be able to read a 3.x index. Writing indexes with a newer version and reading them in an older one is more problematic. That said, a new ES version is more than just a Lucene upgrade, so it might require a reindex.

ES version incompatibility is always due to the internal communication API changing.

-- 
Ivan

On Thu, Oct 11, 2012 at 11:20 AM, Eugene Strokin <[hidden email]> wrote:
Shay, thanks for update.
Just wander about back compatibility.
If switching from 0.19 to 0.20 will it require reindexing, or we could just update the libs?
And from 20 to 21, I guess it will require reindexing, but want to hear confirmation from you.

Thank you


On Thursday, October 11, 2012 11:18:46 AM UTC-4, kimchy wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

phill
On 10/11/2012 1:40 PM, Ivan Brusic wrote:
> Lucene is always backwards compatible within one version, so Lucene
> 4.0 code should be able to read a 3.x index. Writing indexes with a
> newer version and reading them in an older one is more problematic.
> That said, a new ES version is more than just a Lucene upgrade, so it
> might require a reindex.

Here is the statement from the release notes

http://lucene.apache.org/core/4_0_0-BETA/changes/Changes.html#4.0.0-alpha.changes_in_backwards_compatibility_policy

"On upgrading to 4.0, if you do not fully reindex your documents, Lucene
will emulate the new flex API on top of the old index, incurring some
performance cost (up to ~10% slowdown, typically). To prevent this
slowdown, use oal.index.IndexUpgrader to upgrade your indexes to latest
file format (LUCENE-3082
<http://issues.apache.org/jira/browse/LUCENE-3082>)."

Thus, as an app developer we don't have to do the work, we can trigger
an upgrade of each Lucene index when we can using a conversion tool.

http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/index/IndexUpgrader.html

I'm sure this feature will be useful in ES.

-Paul

--


Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Eugene Strokin
Thank you for the replies, It would be very nice feature if ES automatically or by some command would upgrade the index to v.4.
Will be waiting for the new versions of ES.
Thanks again.


On Thursday, October 11, 2012 5:27:17 PM UTC-4, P Hill wrote:
On 10/11/2012 1:40 PM, Ivan Brusic wrote:
> Lucene is always backwards compatible within one version, so Lucene
> 4.0 code should be able to read a 3.x index. Writing indexes with a
> newer version and reading them in an older one is more problematic.
> That said, a new ES version is more than just a Lucene upgrade, so it
> might require a reindex.

Here is the statement from the release notes

http://lucene.apache.org/core/4_0_0-BETA/changes/Changes.html#4.0.0-alpha.changes_in_backwards_compatibility_policy

"On upgrading to 4.0, if you do not fully reindex your documents, Lucene
will emulate the new flex API on top of the old index, incurring some
performance cost (up to ~10% slowdown, typically). To prevent this
slowdown, use oal.index.IndexUpgrader to upgrade your indexes to latest
file format (LUCENE-3082
<http://issues.apache.org/jira/browse/LUCENE-3082>)."

Thus, as an app developer we don't have to do the work, we can trigger
an upgrade of each Lucene index when we can using a conversion tool.

http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/index/IndexUpgrader.html

I'm sure this feature will be useful in ES.

-Paul

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

kimchy
Administrator
Heya, few points:

Indexes in elasticsearch has been "backward" compatible, and when upgrading to Lucene 4.0, it will be backward compatible as well (both on the Lucene level, and on the ES level). You might need to do an "upgrade" to make use of newer versions, or to make sure there is no "emulation" layer on Lucene level.

So, upgrading to 0.20 and 0.21 will be compatible index wise (no need to reindex), but you will need to do a full cluster restart (though we are working on eventually not needing that as well, first infrastructure for that is already going to be in upcoming 0.20).

On Oct 11, 2012, at 4:56 PM, Eugene Strokin <[hidden email]> wrote:

Thank you for the replies, It would be very nice feature if ES automatically or by some command would upgrade the index to v.4.
Will be waiting for the new versions of ES.
Thanks again.


On Thursday, October 11, 2012 5:27:17 PM UTC-4, P Hill wrote:
On 10/11/2012 1:40 PM, Ivan Brusic wrote:
> Lucene is always backwards compatible within one version, so Lucene
> 4.0 code should be able to read a 3.x index. Writing indexes with a
> newer version and reading them in an older one is more problematic.
> That said, a new ES version is more than just a Lucene upgrade, so it
> might require a reindex.

Here is the statement from the release notes

http://lucene.apache.org/core/4_0_0-BETA/changes/Changes.html#4.0.0-alpha.changes_in_backwards_compatibility_policy

"On upgrading to 4.0, if you do not fully reindex your documents, Lucene
will emulate the new flex API on top of the old index, incurring some
performance cost (up to ~10% slowdown, typically). To prevent this
slowdown, use oal.index.IndexUpgrader to upgrade your indexes to latest
file format (LUCENE-3082
<http://issues.apache.org/jira/browse/LUCENE-3082>)."

Thus, as an app developer we don't have to do the work, we can trigger
an upgrade of each Lucene index when we can using a conversion tool.

http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/index/IndexUpgrader.html

I'm sure this feature will be useful in ES.

-Paul

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

simonw-2
In reply to this post by Ivan Brusic


On Thursday, October 11, 2012 10:40:53 PM UTC+2, Ivan Brusic wrote:
Lucene is always backwards compatible within one version, so Lucene 4.0 code should be able to read a 3.x index. Writing indexes with a newer version and reading them in an older one is more problematic. That said, a new ES version is more than just a Lucene upgrade, so it might require a reindex.

this question has been asked a couple of times so let me elaborate on this a little for those who are interested what the main differences are between 4.0 and 3.x on the index level. It is correct in general that lucene 4.0 can read 3.0 indexes but this comes with a cost this time. in lucene 4.0 we changed the sort order from UTF-16 to UTF-8 on the lowest level so 3.x indices are sorted "differently". To make this still work with the 4.0 API we added a "re-mapping" layer for surrogate characters to maintain the correct sort order. This will have some cost in performance but it should not be dramatic. Yet, it is still a good idea to upgrade the index. The good news is lucene can by-itself do that in the background. If you merge a 3.x segment with 4.0 it will be merged into a 4.0 segment so you can "upgrade-over-time". Anyhow, in practice ES users should be need this knowledge and we will provide flexible ways to upgrade to an ES version that runs lucene 4.0.

simon 

ES version incompatibility is always due to the internal communication API changing.

-- 
Ivan

On Thu, Oct 11, 2012 at 11:20 AM, Eugene Strokin <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="-knFIJIh1nEJ">eug...@...> wrote:
Shay, thanks for update.
Just wander about back compatibility.
If switching from 0.19 to 0.20 will it require reindexing, or we could just update the libs?
And from 20 to 21, I guess it will require reindexing, but want to hear confirmation from you.

Thank you


On Thursday, October 11, 2012 11:18:46 AM UTC-4, kimchy wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Tanguy
Thank you Shay & Simon for giving us those details. Customers are already wondering when Lucene 4 will be available in ES.

-- Tanguy
@tlrx

Le vendredi 12 octobre 2012 10:26:33 UTC+2, simonw a écrit :


On Thursday, October 11, 2012 10:40:53 PM UTC+2, Ivan Brusic wrote:
Lucene is always backwards compatible within one version, so Lucene 4.0 code should be able to read a 3.x index. Writing indexes with a newer version and reading them in an older one is more problematic. That said, a new ES version is more than just a Lucene upgrade, so it might require a reindex.

this question has been asked a couple of times so let me elaborate on this a little for those who are interested what the main differences are between 4.0 and 3.x on the index level. It is correct in general that lucene 4.0 can read 3.0 indexes but this comes with a cost this time. in lucene 4.0 we changed the sort order from UTF-16 to UTF-8 on the lowest level so 3.x indices are sorted "differently". To make this still work with the 4.0 API we added a "re-mapping" layer for surrogate characters to maintain the correct sort order. This will have some cost in performance but it should not be dramatic. Yet, it is still a good idea to upgrade the index. The good news is lucene can by-itself do that in the background. If you merge a 3.x segment with 4.0 it will be merged into a 4.0 segment so you can "upgrade-over-time". Anyhow, in practice ES users should be need this knowledge and we will provide flexible ways to upgrade to an ES version that runs lucene 4.0.

simon 

ES version incompatibility is always due to the internal communication API changing.

-- 
Ivan

On Thu, Oct 11, 2012 at 11:20 AM, Eugene Strokin <[hidden email]> wrote:
Shay, thanks for update.
Just wander about back compatibility.
If switching from 0.19 to 0.20 will it require reindexing, or we could just update the libs?
And from 20 to 21, I guess it will require reindexing, but want to hear confirmation from you.

Thank you


On Thursday, October 11, 2012 11:18:46 AM UTC-4, kimchy wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Clinton Gormley-2
In reply to this post by kimchy

>
>
> So, upgrading to 0.20 and 0.21 will be compatible index wise (no need
> to reindex), but you will need to do a full cluster restart (though we
> are working on eventually not needing that as well, first
> infrastructure for that is already going to be in upcoming 0.20).

This is excellent news!


>

--


Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Ivan Brusic
In reply to this post by Matt Weber-2
Lucene 4.0 was officially released today.

http://lucene.apache.org/core/corenews.html

On Thu, Oct 11, 2012 at 8:40 AM, Matt Weber <[hidden email]> wrote:
Awesome, I can't wait!  


On Thu, Oct 11, 2012 at 8:18 AM, Shay Banon <[hidden email]> wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--



--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

joergprante@gmail.com
In reply to this post by kimchy
Hi Shay,

Lucene 4 just got released :)

I would love to get some pointers what kind of help of the community is most welcome?

For example, with Lucene 4.0, will there be an elasticsearch-index-spellcheck module? How about introducing modules for codecs? How will support for Lucene payloads look like in ES? Just to mention a few exciting things...

Best regards,

Jörg


On Thursday, October 11, 2012 5:18:46 PM UTC+2, kimchy wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

kimchy
Administrator
The plan is the first get Lucene 4.0 integrated with elasticsearch, and then expose all the new features. We will take it feature by feature, but to your points, there will be a spellcheck builtin using the new "direct" spellcheck feature, you will be able to configure codecs in the mapping, and write a plugin that introduces new codes, and so on...

On Oct 12, 2012, at 8:54 AM, Jörg Prante <[hidden email]> wrote:

Hi Shay,

Lucene 4 just got released :)

I would love to get some pointers what kind of help of the community is most welcome?

For example, with Lucene 4.0, will there be an elasticsearch-index-spellcheck module? How about introducing modules for codecs? How will support for Lucene payloads look like in ES? Just to mention a few exciting things...

Best regards,

Jörg


On Thursday, October 11, 2012 5:18:46 PM UTC+2, kimchy wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Ivan Brusic
In reply to this post by joergprante@gmail.com
Lucid Imagination has an annotated version of the “Release Highlights” for Lucene/Solr 4.0


Andrzej Białecki, Robert Muir, and Grant Ingersoll will be presenting a paper on the Lucene 4 architecture: http://opensearchlab.otago.ac.nz/paper_10.pdf 

Good reading if you like to understand the guts.

-- 
Ivan

On Fri, Oct 12, 2012 at 8:54 AM, Jörg Prante <[hidden email]> wrote:
Hi Shay,

Lucene 4 just got released :)

I would love to get some pointers what kind of help of the community is most welcome?

For example, with Lucene 4.0, will there be an elasticsearch-index-spellcheck module? How about introducing modules for codecs? How will support for Lucene payloads look like in ES? Just to mention a few exciting things...

Best regards,

Jörg


On Thursday, October 11, 2012 5:18:46 PM UTC+2, kimchy wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

banderon1
In reply to this post by kimchy
Shout out on techcrunch!
http://techcrunch.com/2012/10/12/open-source-search-engine-apache-lucenesolr-gets-big-update

On Thursday, October 11, 2012 8:18:46 AM UTC-7, kimchy wrote:
Heya fellows,

    Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

    We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Nicolas Blanc
In reply to this post by kimchy
Can i expect to see the grouping feature in 0.20 ? Today i worked with a 0.19.4 version of ES, with Martijn code for grouping and some custom patchs to use more facet types than just terms string. And i really want to stop maintaining custom version of ES :)

Or need i to wait for the next 0.21 based on  new Lucene 4.0 ?

Thx in advance,

--
Nicolas BLANC.

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Aaron Rosenthal
In reply to this post by kimchy
We would like to plan for .21 whats a realistic time table, thx. great work!

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Ivan Brusic
Still waiting for .20 to be released! :)

On Tue, Oct 23, 2012 at 12:22 PM, Aaron Rosenthal <[hidden email]> wrote:
We would like to plan for .21 whats a realistic time table, thx. great work!

--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

kimchy
Administrator
0.20.0.RC1 was already released, working on the blog post now.

On Oct 23, 2012, at 9:48 PM, Ivan Brusic <[hidden email]> wrote:

Still waiting for .20 to be released! :)

On Tue, Oct 23, 2012 at 12:22 PM, Aaron Rosenthal <[hidden email]> wrote:
We would like to plan for .21 whats a realistic time table, thx. great work!

--
 
 


--
 
 

--
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lucene 4.0

Robin Verlangen
I'm really looking forward to this. Good job for you guys, keep moving forward!

Best regards, 

Robin Verlangen
Software engineer




Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.



2012/10/24 <[hidden email]>
0.20.0.RC1 was already released, working on the blog post now.

On Oct 23, 2012, at 9:48 PM, Ivan Brusic <[hidden email]> wrote:

Still waiting for .20 to be released! :)

On Tue, Oct 23, 2012 at 12:22 PM, Aaron Rosenthal <[hidden email]> wrote:
We would like to plan for .21 whats a realistic time table, thx. great work!

--
 
 


--
 
 

--
 
 

--
 
 
12