Why is index not written to hdfs?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Why is index not written to hdfs?

Mohit Anchlia
I have hadoop plugin with hdfs gateway but what I am seeing is that indexes are still being written locally. Can you please help me understand why it's being written locally?
 

# ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

 

# hadoop fs -ls /elasticsearch

#returns nothing

config:

 

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch

Reply | Threaded
Open this post in threaded view
|

Re: Why is index not written to hdfs?

Mohit Anchlia
I finally got this working. Does anyone know when elasticsearch writes data to hdfs? Is it almost real time.
 
If I kill the node can I lose some data?

On Mon, May 14, 2012 at 3:36 PM, Mohit Anchlia <[hidden email]> wrote:
I have hadoop plugin with hdfs gateway but what I am seeing is that indexes are still being written locally. Can you please help me understand why it's being written locally?
 

# ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

 

# hadoop fs -ls /elasticsearch

#returns nothing

config:

 

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch


Reply | Threaded
Open this post in threaded view
|

Re: Why is index not written to hdfs?

Berkay Mollamustafaoglu-2
In reply to this post by Mohit Anchlia
There are two ways to configure ES for persistence (for the data to survive full cluster restart)
1. Local gateway, where the data persists on the servers
2. Shared or central gateway (S3, Hadoop, or shared file system) where data is stored elsewhere.

In either case, data is still stored locally. With the shared gateway, data is restored from that data store when a node restarts. For more information, highly recommend reading the docs thoroughly. http://www.elasticsearch.org/guide/reference/modules/gateway/


Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype


On Mon, May 14, 2012 at 6:36 PM, Mohit Anchlia <[hidden email]> wrote:
I have hadoop plugin with hdfs gateway but what I am seeing is that indexes are still being written locally. Can you please help me understand why it's being written locally?
 

# ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

 

# hadoop fs -ls /elasticsearch

#returns nothing

config:

 

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch


Reply | Threaded
Open this post in threaded view
|

Re: Why is index not written to hdfs?

Mohit Anchlia
Yes I read that and also have done recovery testing too, which seems to recover everything. My question was when does elasticsearch writes/commits data to Hadoop? Is it synchronously or async? Should I expect to lose any data that might be in elasticsearch memory? Just trying to understand the basics.
 
On Mon, May 14, 2012 at 4:23 PM, Berkay Mollamustafaoglu <[hidden email]> wrote:
There are two ways to configure ES for persistence (for the data to survive full cluster restart)
1. Local gateway, where the data persists on the servers
2. Shared or central gateway (S3, Hadoop, or shared file system) where data is stored elsewhere.

In either case, data is still stored locally. With the shared gateway, data is restored from that data store when a node restarts. For more information, highly recommend reading the docs thoroughly. http://www.elasticsearch.org/guide/reference/modules/gateway/


Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype


On Mon, May 14, 2012 at 6:36 PM, Mohit Anchlia <[hidden email]> wrote:
I have hadoop plugin with hdfs gateway but what I am seeing is that indexes are still being written locally. Can you please help me understand why it's being written locally?
 

# ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

 

# hadoop fs -ls /elasticsearch

#returns nothing

config:

 

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch



Reply | Threaded
Open this post in threaded view
|

Re: Why is index not written to hdfs?

kimchy
Administrator
By default, elasticsearch will snapshot the data to HDFS every 10 seconds, I answered in another thread you posted regarding using the local gateway.

On Tue, May 15, 2012 at 2:40 AM, Mohit Anchlia <[hidden email]> wrote:
Yes I read that and also have done recovery testing too, which seems to recover everything. My question was when does elasticsearch writes/commits data to Hadoop? Is it synchronously or async? Should I expect to lose any data that might be in elasticsearch memory? Just trying to understand the basics.
 
On Mon, May 14, 2012 at 4:23 PM, Berkay Mollamustafaoglu <[hidden email]> wrote:
There are two ways to configure ES for persistence (for the data to survive full cluster restart)
1. Local gateway, where the data persists on the servers
2. Shared or central gateway (S3, Hadoop, or shared file system) where data is stored elsewhere.

In either case, data is still stored locally. With the shared gateway, data is restored from that data store when a node restarts. For more information, highly recommend reading the docs thoroughly. http://www.elasticsearch.org/guide/reference/modules/gateway/


Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype


On Mon, May 14, 2012 at 6:36 PM, Mohit Anchlia <[hidden email]> wrote:
I have hadoop plugin with hdfs gateway but what I am seeing is that indexes are still being written locally. Can you please help me understand why it's being written locally?
 

# ls -ltr data/elasticsearch/nodes/0/indices/twitter/0/index/

total 12

-rw-r--r-- 1 root root 0 May 14 15:33 write.lock

-rw-r--r-- 1 root root 20 May 14 15:33 segments.gen

-rw-r--r-- 1 root root 58 May 14 15:33 segments_1

-rw-r--r-- 1 root root 8 May 14 15:33 _checksums-1337034789114

 

# hadoop fs -ls /elasticsearch

#returns nothing

config:

 

gateway.type: hdfs

gateway.hdfs.uri: hdfs://db1:54310

gateway.hdfs.path: elasticsearch