How to Parse JSON and CSV Data in ElasticSearch?

ik

ELK Stack is the best tool to visualize the logging data, System generates a lot of logs on each and every seconds. It is not easy to directly get the necessary data from the logs. So that we need some kind of Analytics, Visualize and Searching tool. ELK Stack does all the things.

System and our Application won’t generate the logs in the same format, So we may have a different kind of data in different format like data contains JSON value, logging CSV format and diff style of data format, In this blog, I am going to show how to index and visualize the different kind logging data with ELK Stack.

First, you have to set up the ELK stack application in your system. The main part of the this is configuring filters in Logstash. I will show you how to configure the grok filter for different kinds of logging data.

Sample Data:

2019-10-01 00:00:10     357294
{"status":"true","additionalStatus":"OK","name":"mark","email":gmail@mark.com"}

From this Sample Data, we have to index status, name, and email value from data. You can see all those values are in JSON format. Let’s see how to extract it with the grok filters.

filter {
  grok {
    match => {
       "message" =>"%{TIMESTAMP_ISO8601:date}+\t%{NUMBER:id}+\t%{GREEDYDATA:jsondata}"
     }
  }
  json {
    source => jsondata
       target => 'doc'
    }
    mutate {
        add_field => {
           "status" => "%{[doc][status]}"
           "email" => "%{[doc][email]}"
        }
    }
    if "_jsonparsefailure" in [tags] { 
       drop { } 
    }
}

First, we have to define the message under match, in the message we have to format the data by how it is presented in the log file. I used the following grok filter

"%{TIMESTAMP_ISO8601:date}+\t%{NUMBER:id}+\t%{GREEDYDATA:jsondata}"

It defines the message start with a timestamp and it continues with some number after that it has JSON data GREEDYDATA will define all the remaining data into JSON data variable.

And next, we configure JSON next to match, it is a simple configuration with define source and target. The source is jsondata which grabs the data from the grok match. And make the array value in the doc variable as a target.

After that we can extract the JSON data from doc variable to our custom variable to index, Mutate will help on this, we can define custom variable here.

I define status and email with "%{[doc][status]}", "%{[doc][email]}" respectively.

Sometimes we store log in CSV file, ELK has an option to index those things also. Simple CSV config in the filter will extract and index the data smoothly.

Filter {
	csv {
            columns => ["Account","date","email","name"]]
            separator => ","
            skip_empty_columns => "true"
        }

}

Just define with column name under the CSV will do the magic

I hope you learned something interesting in this blog, will come up with more later.