Input file with custom delimiter

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Input file with custom delimiter

Gopimanikandan Sengodan
Hi All,

We are planning to load the data to elastic search from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter. 

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the millions of data to Elastic search?


SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Input file with custom delimiter

dadoonet
Have a look at logstash. It will help you here.

My 2 cents.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 janv. 2015 à 12:40, Gopimanikandan Sengodan <[hidden email]> a écrit :

Hi All,

We are planning to load the data to elastic search from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter. 

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the millions of data to Elastic search?


SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1F5D8517-43E4-4FC2-984A-2F75C9FA1EDB%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Input file with custom delimiter

Gopimanikandan Sengodan
Hi David,

Thanks for your suggestions.

I have tried using logstash but this delimiter not working. It loaded in single column instead of multiple one.

On Wednesday, January 7, 2015 6:00:33 PM UTC+5:30, David Pilato wrote:
Have a look at logstash. It will help you here.

My 2 cents.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 janv. 2015 à 12:40, Gopimanikandan Sengodan <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="y1wIEkaP0r4J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">gopiman...@...> a écrit :

Hi All,

We are planning to load the data to elastic search from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter. 

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the millions of data to Elastic search?


SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="y1wIEkaP0r4J" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">elasticsearc...@googlegroups.com.
To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com?utm_medium=email&amp;utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;" onclick="this.href='https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;">https://groups.google.com/d/msgid/elasticsearch/26e8c669-ec2a-481f-86dc-4c7fe4e1039a%40googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b04ed2b7-0347-433a-b1d3-f92ff9372d4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Input file with custom delimiter

InquiringMind
In reply to this post by Gopimanikandan Sengodan
Gopi,

You really have a CSV file but using ^ instead of , as your delimiter.

I happen to write my own CSV-to-JSON converter, giving it the options I needed (including specification or auto-detection of numbers, date format normalization, auto-creating of the action and meta data line, and so on). I did this before stumbling across logstash, but still found it easier to write and maintain this code myself.

Choose the language you wish: I wrote one version of mine in C++ but the subsequent version in Java. I also wrote a bulk load client in Java to avoid the limitations of curl (and also its complete lack of existence on various platforms).

(logstash is much better for log files; my converter is much better for generic CSV)

I know this isn't exactly the pre-written tool you are looking for. But converting the CSV (with the option to override the delimiter values) into JSON isn't very hard to do. And once that's done, it's an easy matter to add the action and meta data and have a bulk-ready data stream.

Brian

On Wednesday, January 7, 2015 6:40:34 AM UTC-5, Gopimanikandan Sengodan wrote:
Hi All,

We are planning to load the data to elastic search from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter. 

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the millions of data to Elastic search?


SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0e6be2e-d94c-4538-89d6-d7afdb6945af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Input file with custom delimiter

Gopimanikandan Sengodan

Thank you brian.  Let me change it accodingly as per your suggestion.  Could it possible to share the bulk load client and csv to json converter?

On Jan 7, 2015 9:00 PM, "Brian" <[hidden email]> wrote:
Gopi,

You really have a CSV file but using ^ instead of , as your delimiter.

I happen to write my own CSV-to-JSON converter, giving it the options I needed (including specification or auto-detection of numbers, date format normalization, auto-creating of the action and meta data line, and so on). I did this before stumbling across logstash, but still found it easier to write and maintain this code myself.

Choose the language you wish: I wrote one version of mine in C++ but the subsequent version in Java. I also wrote a bulk load client in Java to avoid the limitations of curl (and also its complete lack of existence on various platforms).

(logstash is much better for log files; my converter is much better for generic CSV)

I know this isn't exactly the pre-written tool you are looking for. But converting the CSV (with the option to override the delimiter values) into JSON isn't very hard to do. And once that's done, it's an easy matter to add the action and meta data and have a bulk-ready data stream.

Brian

On Wednesday, January 7, 2015 6:40:34 AM UTC-5, Gopimanikandan Sengodan wrote:
Hi All,

We are planning to load the data to elastic search from the delimited file.

The file has been delimited with 0x88(ˆ) delimiter. 

Can you please let me know how to load the delimited file to Elastic?

Also, Please let me know what is the best and fastest way to load the millions of data to Elastic search?


SAMPLE:

XXXXXˆYYYYYYˆZZZZ

Thanks,
Gopi

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/j8LIPILQr6s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0e6be2e-d94c-4538-89d6-d7afdb6945af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABZ89JgFy80u2GuszADMeX6HFig_hmdXjt3O3hTK%2BF9m7pzzVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Input file with custom delimiter

InquiringMind
I wish I could, but currently prohibited. However, I can point you to some very good Java libraries:

The CSV parser supplied by the Apache project works well:


You can override the delimiter using the static CSVFormat newFormat(char delimiter) method which creates a new CSV format with the specified delimiter:


Then use the XContentBuilder cb = jsonBuilder() method call to create a content builder to convert your records to single-line JSON.

For example, the action and meta data object I use is based on the following ENUM and toString method to emit as JSON. I've left out the parst that I use in other custom libraries that allow Java code to easily set up this information, and also to set this from a search response or a get-by-id response:

  public enum OpType
  {
    CREATE,
    INDEX,
    DELETE
  }

  @Override
  public String toString()
  {
    try
    {
      XContentBuilder cb = jsonBuilder();
      cb.startObject();

      cb.field(opType.toString().toLowerCase());
      cb.startObject();

      cb.field("_index", index);
      cb.field("_type", type);
      if (id != null)
        cb.field("_id", id);

      if (version > 0)
      {
        cb.field("_version", version);
        if (versionType == VersionType.EXTERNAL)
          cb.field("_version_type", "external");
      }

      if (ttl != null)
        cb.field("_ttl", ttl);

      cb.endObject();

      cb.endObject();
      return cb.string();
    }
    catch (IOException e)
    {
      return ("null");
    }

  }

  /* Operation type (action): "create" or "index" or "delete" */
  private OpType      opType      = OpType.INDEX;

  /* Metadata that this object supports */
  private String      index       = null;
  private String      type        = null;
  private String      id          = null;
  private long        version     = 0;
  private VersionType versionType = VersionType.INTERNAL;
  private TimeValue   ttl         = null;

And the actual data line that would follow is similarly constructed using the content builder.

I wish I could help you more.

Brian


On Wednesday, January 7, 2015 10:41:26 AM UTC-5, Gopimanikandan Sengodan wrote:

Thank you brian.  Let me change it accodingly as per your suggestion.  Could it possible to share the bulk load client and csv to json converter?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d46f746-04c6-48fe-93bc-a0c8612539ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Input file with custom delimiter

Gopimanikandan Sengodan

Thank you so much brian.  I will make use of this in my project.

On Jan 7, 2015 9:35 PM, "Brian" <[hidden email]> wrote:
I wish I could, but currently prohibited. However, I can point you to some very good Java libraries:

The CSV parser supplied by the Apache project works well:


You can override the delimiter using the static CSVFormat newFormat(char delimiter) method which creates a new CSV format with the specified delimiter:


Then use the XContentBuilder cb = jsonBuilder() method call to create a content builder to convert your records to single-line JSON.

For example, the action and meta data object I use is based on the following ENUM and toString method to emit as JSON. I've left out the parst that I use in other custom libraries that allow Java code to easily set up this information, and also to set this from a search response or a get-by-id response:

  public enum OpType
  {
    CREATE,
    INDEX,
    DELETE
  }

  @Override
  public String toString()
  {
    try
    {
      XContentBuilder cb = jsonBuilder();
      cb.startObject();

      cb.field(opType.toString().toLowerCase());
      cb.startObject();

      cb.field("_index", index);
      cb.field("_type", type);
      if (id != null)
        cb.field("_id", id);

      if (version > 0)
      {
        cb.field("_version", version);
        if (versionType == VersionType.EXTERNAL)
          cb.field("_version_type", "external");
      }

      if (ttl != null)
        cb.field("_ttl", ttl);

      cb.endObject();

      cb.endObject();
      return cb.string();
    }
    catch (IOException e)
    {
      return ("null");
    }

  }

  /* Operation type (action): "create" or "index" or "delete" */
  private OpType      opType      = OpType.INDEX;

  /* Metadata that this object supports */
  private String      index       = null;
  private String      type        = null;
  private String      id          = null;
  private long        version     = 0;
  private VersionType versionType = VersionType.INTERNAL;
  private TimeValue   ttl         = null;

And the actual data line that would follow is similarly constructed using the content builder.

I wish I could help you more.

Brian


On Wednesday, January 7, 2015 10:41:26 AM UTC-5, Gopimanikandan Sengodan wrote:

Thank you brian.  Let me change it accodingly as per your suggestion.  Could it possible to share the bulk load client and csv to json converter?

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/j8LIPILQr6s/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d46f746-04c6-48fe-93bc-a0c8612539ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CABZ89JgyKKLUU5h53Lupas2-fE7a1Ka9fDnzGke9sr0p8TL6%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.