[Just Pushed]: Cloud Plugin: Using the cloud for gateway storage, and for auto discovery
One of my utmost features that I wanted to get into elasticsearch is now on master. ElasticSearch was built from the ground up to work well in cloud environments (and of course, outside the cloud as well), and now this vision is fully realized.
The first support is allowing to use the cloud as a gateway storage. This means that Amazon S3, Rackspace CloudFiles, or Azure Blob can be used to store both the cluster meta data and each index information (index files and transaction logs). This support really make sense and, at the end saves you money ;). The saving money part comes from the fact that the index is stored locally (no need for EBS) and automatically mirrored to S3 / CloudFiles.
The second support, and one of the reasons for the new Zen discovery module (which is the default in master and future elasticsearch releases), is the ability to use the cloud information for auto discovery of nodes. In most cloud environments, multicast is disabled. This means that a gossip routed needs to be defined with an elastic ip, and just for HA, more nodes need to be defined. With the elasticsearch cloud discovery, all nodes are created equal, no need for elastic IPs, or special nodes!.
Here is a simple configuration for Amazon (works on RackSpace as well...) that stores the data on S3 and uses EC2 to discovery nodes: