Few months back, I had the opportunity to conduct two workshops at BSidesDelhi and CSI Mumbai on the above topic. Both sessions were great experiences and allowed me to see the growing interest among the information security folks for the opensource ELK stack.
For the uninitiated, the ELK stack is primarily built up of 3 components – Elasticsearch (E), Logstash (L), Kibana (K). There are other components like Beats, Modules, Plugins, etc. but we shall leave those aside for now.
Elasticsearch is the main data store, analytics and search engine. Logstash handles the log integration, processing, parsing and enrichment. Kibana is a visualization layer sitting on top of Elasticsearch. It allows us to perform search queries and aggregations and create visualizations using a simple web UI.
The ELK stack is a very flexible platform and it has been used for multiple use-cases across different industries. In the Information Security domain, it is usually compared with the Splunk platform.
Some of our use-cases of the ELK stack include:
- Threat Hunting
- Log Management
- Security Monitoring
- Forensic Log Investigations
The common theme in all the above use-cases is that the ELK platform provides you with capabilities to analyze the logs in a significantly better manner. It doesn’t hurt that it does so at blazingly fast speeds 😊
For this blogpost, I’ll demonstrate how we can use the ELK stack to analyze a set of Apache web server logs and identify (potential) malicious actors or traffic patterns.
We shall use a partial log data set from an investigation we did for a client.
The flow of data would be in the below manner:
Filebeat is, in simple terms, a log shipper. We need to define the path of our log file in filebeat and it will ship the data to Logstash (or Elasticsearch if needed)
In Logstash, we will receive the logs sent by Filebeat and then parse out the relevant fields using GROK filter (GROK is a regex-based pattern extraction mechanism).
To improve our analysis, we can further enhance the events by including the geo-location details of the source IP address (website visitor) and convert the user-agent string into known buckets of OS and Browsers.
The full Logstash configuration is available on our Gitlab repo
Once we have parsed the events correctly, we can ship the logs to Elasticsearch for ingestion.
In Kibana we must create an ‘index pattern’ so that Kibana can retrieve the necessary indexed data.
Once we see some logs being ingested in the ‘Discover’ tab in Kibana, we can start building our visualizations.
Visualizations are an important component in Kibana. Visual representation of data helps the analyst/investigator understand the layout of the log events, identify any outlier/anomaly in a much quicker manner compared to CLI based analysis. As is commonly said – a picture is worth 1000 words
So, to start, we will first build a pie-chart which helps us understand the distribution of log events. We pick the HTTP method field, which has a finite set of available values.
As is seen from the visualization above, we arrive at the expected chart – showing a majority of requests using the GET HTTP method.
We can start with some preliminary analysis of the dataset in the ‘visualization’ stage itself. However, a lot of context and analytics start building up at the ‘Dashboard’ stage – where we can visually corelate different chunks of data from with the log data-set
We go ahead and create additional visualizations like
Once the necessary visualizations are created, we collect all of them into a single dashboard for analysis and correlation.
Now this is where Elasticsearch and especially Kibana shines. What would take us hours of analysis (in maybe Excel or CLI) can be completed in a few minutes.
Firstly, we notice that our data contains some operational/garbage entries (indicated by ::1 in clientip field). These entries were simply performance/application monitoring scripts running scheduled tasks. So, we can exclude these entries across all visualizations by clicking the (-) icon. That reduces the data by ~85k events.
Next, we start with the HTTP methods. A quick look and we find our first outlier. HTTP PUT method is considered ‘dangerous’ . It is not expected to be used in production environments. Filtering on this HTTP method, leaves us with two events.
OK – not much to go on with. But since this IP was trying to ‘put’ a text file on my server, let’s see what else he did or managed to achieve. So, we will remove the filter for PUT method and apply the filter only for the IP address – that’s about 2 clicks away 😊
So, we found that this IP sent 42 requests to our server. None of these appear legitimate (HEAD and PUT requests only?). Additionally, our User-Agent(UA) analysis visualization places his browser in the ‘Other’. This is indicative of the fact that he’s not browsing our site using traditional browsers.
Looking at the raw requests, we can see that the attacker is trying to identify an application plugin called ‘fckeditor’ on our website
Luckily for us, it doesn’t exist on our site. However, it should be noted that Fckeditor was known to have few code-execution and file-upload vulnerabilities. Most likely that is what the attacker is trying to hit.
We could now go a step further and see who else tried hitting our server with the fckeditor probe
Interesting! So, we do have others trying to hit us with the Fckeditor exploit.
We could go on in this manner, picking individual data points in the different visualizations that we prepared and identify outliers in them.
I hope I was able to demonstrate the capabilities of ELK in this example. If you’re interested in learning more about the ELK stack and our use-cases contact us for our professional services and training programs around threat hunting and security analytics.