This is Part 5 of my tutorial series on ELK on CentOS 7

  • Part 1 - The Foundation
  • Part 2 - Elasticsearch
  • Part 3 - Kibana
  • Part 4 - Logstash
  • Part 5 (This Site)- Filebeat with Apache and Nginx

In this tutorial we will actually work on Logstash, too since we need to implement some filters to make sure the logs from your web-servers are parsed properly.

Important Note: I have altered the configuration of Logstash in Part 4 considerably from a previous version. The main change revolves around index templates and the use of index naming schemes. If you have used the tutorial part 4 prior to April 4, 2019, you may want to go back to Part 4 and check out the changes since everything in this tutorial will rely on the new version.

Install Filebeat

In the last part, we have already created certificates which allow us to securely send logs from a server to Logstash.

Now, on your host from which you want to send logs on to your ELK-cluster, we need to install Filebeat. All installation types are described here.

Btw, you don't need Java installed for Filebeat.

Copy Cert Files

If you've followed my tutorial so far, you should have created some certificates on your Logstash server which you should find at /etc/elk-certs unless you used different folders.

If you haven't already, copy the files beat.crt, beat.key, and ca.crt to the host(s) you want to send logs from to Logstash. Place them in the folder /etc/elk-certs (You will need to create the folder).

Below is an example workflow. I've used example names for the users and hosts, which you need to replace with your correct ones. Below is a list of the example names and what you need to replace them with. I'm also assuming that you're using a sudo-user on your web-server:

sammy = replace this with your sudo user name on your Logstash server
logstash.elkdomain.com = replace this with the hostname of your Logstash server
/etc/elk-certs = folder on your Logstash server which contain the certificates you created in the previous tutorial.

On your web-server:

$ sudo mkdir /etc/elk-certs
$ sudo scp -r  sammy@logstash.elkdomain.com:"/etc/elk-certs/beat.crt /etc/elk-certs/beat.key /etc/elk-certs/ca.crt" "/etc/elk-certs/"

Make sure, you copy the 3 files. If the above doesn't work for you, just use an sftp client such as Cyberduck to download to your local machine from the Logstash server and upload to the web-server.

Configure Filebeat

The configuration is fairly simple. First we should enable the correct module. Filebeat comes with some preconfigured modules which make it easy for you to collect the logs. You can easily enable the module with ``filebeat modules enable [module]```.

For Nginx:

$ sudo filebeat modules enable nginx

For Apache:

$ sudo filebeat modules enable apache2

A full list is available here:

$ sudo filebeat modules list

Next, go to /etc/filebeat, and open the file filbeat.yml:

$ cd /etc/filebeat
$ sudo nano filebeat.yml

Find the section Logstash-Output and make sure this is the configuration (replace hosts with your Logstash host):

output.logstash:
  hosts: ["elastic1.letter22.co:5044"]
  ssl.certificate_authorities: ["/etc/elk-certs/ca.crt"]
  ssl.certificate: "/etc/elk-certs/beat.crt"
  ssl.key: "/etc/elk-certs/beat.key"

Now, Filebeat is configured and done. To run it, use the usual commands:

$ sudo systemctl enable filebeat  # this will make sure it runs at startup
$ sudo systemctl start filebeat

Back to Logstash

Log on to the console of your logstash machine. We now need to set the proper filters for your logs. First, let's do Apache2:

Apache2 Filters on Logstash:

On your logstash server, go to /etc/logstash/webserver.conf.d and create a new file:

$ cd /etc/logstash/webserver.conf.d
$ sudo touch apache_filer.conf

Enter this configuration:

filter {
  if [fileset][module] == "apache2" {
    if [fileset][name] == "access" {
      grok {
        match => { "message" => ["%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \[%{HTTPDATE:[apache2][access][time]}\] \"%{WORD:[apache2][access][method]} %{DATA:[apache2][access][url]} HTTP/%{NUMBER:[apache2][access][http_version]}\" %{NUMBER:[apache2][access][response_code]} %{NUMBER:[apache2][access][body_sent][bytes]}( \"%{DATA:[apache2][access][referrer]}\")?( \"%{DATA:[apache2][access][agent]}\")?",
          "%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \\[%{HTTPDATE:[apache2][access][time]}\\] \"-\" %{NUMBER:[apache2][access][response_code]} -" ] }
        remove_field => "message"
      }
      mutate {
        add_field => { "read_timestamp" => "%{@timestamp}" }
      }
      date {
        match => [ "[apache2][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        remove_field => "[apache2][access][time]"
      }
      useragent {
        source => "[apache2][access][agent]"
        target => "[apache2][access][user_agent]"
        remove_field => "[apache2][access][agent]"
      }
      geoip {
        source => "[apache2][access][remote_ip]"
        target => "[apache2][access][geoip]"
      }
    }
    else if [fileset][name] == "error" {
      grok {
        match => { "message" => ["\[%{APACHE_TIME:[apache2][error][timestamp]}\] \[%{LOGLEVEL:[apache2][error][level]}\]( \[client %{IPORHOST:[apache2][error][client]}\])? %{GREEDYDATA:[apache2][error][message]}",
          "\[%{APACHE_TIME:[apache2][error][timestamp]}\] \[%{DATA:[apache2][error][module]}:%{LOGLEVEL:[apache2][error][level]}\] \[pid %{NUMBER:[apache2][error][pid]}(:tid %{NUMBER:[apache2][error][tid]})?\]( \[client %{IPORHOST:[apache2][error][client]}\])? %{GREEDYDATA:[apache2][error][message1]}" ] }
        pattern_definitions => {
          "APACHE_TIME" => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
        }
        remove_field => "message"
      }
      mutate {
        rename => { "[apache2][error][message1]" => "[apache2][error][message]" }
      }
      date {
        match => [ "[apache2][error][timestamp]", "EEE MMM dd H:m:s YYYY", "EEE MMM dd H:m:s.SSSSSS YYYY" ]
        remove_field => "[apache2][error][timestamp]"
      }
    }
  }
}

Now, this filter looks wild, but it's actually not that complicated. This filter works with the preconfigured Kibana dashboards provided by Elastic. It creates layered field names which will it make much easier to process them in Kibana. For details, check out Elastic's page on this.

Nginx

Analogous to what we did for apache, we can now create a configuration for Nginx on our logstash server:

$ sudo nano /etc/logstash/webserver.conf.d/nginx_filter.conf

Dump the following content in the file (config source):

filter {
  if [fileset][module] == "nginx" {
    if [fileset][name] == "access" {
      grok {
        match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
      }
      mutate {
        add_field => { "read_timestamp" => "%{@timestamp}" }
      }
      date {
        match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        remove_field => "[nginx][access][time]"
      }
      useragent {
        source => "[nginx][access][agent]"
        target => "[nginx][access][user_agent]"
        remove_field => "[nginx][access][agent]"
      }
      geoip {
        source => "[nginx][access][remote_ip]"
      }
    }
    else if [fileset][name] == "error" {
      grok {
        match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
      }
      mutate {
        rename => { "@timestamp" => "read_timestamp" }
      }
      date {
        match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
        remove_field => "[nginx][error][time]"
      }
    }
  }
}

Now we should have a fully configured Logstash instance which can process Nginx log files.

Let's make sure, Logstash will be started at system boot and then fire up the beast:

$ sudo systemctl enable logstash.service
$ sudo systemctl start logstash.service

To make sure, everything is running smooth, let's check the daemons:

$ sudo lsof -Pni | grep logstash

This should give you something like this:

java     15411      logstash   61u  IPv6 2028031      0t0  TCP 127.0.0.1:34850->127.0.0.1:9200 (ESTABLISHED)
java     15411      logstash  100u  IPv6 2028036      0t0  TCP *:5044 (LISTEN)
java     15411      logstash  103u  IPv6 2028046      0t0  TCP 127.0.0.1:9600 (LISTEN)

If you have any troubles, log in as root and check out the log files in /var/log/logstash.

If Everything worked out, your logs should show up in Elasticsearch. Fire up Kibana and check under Discovery, if you can see anything.

Kibana Sample Dashboards

This part can be a bit tricky if you're not using the index naming scheme and logstash filters I use in the config examples above.

The Kibana sample dashboards, which are actually part of the Filebeat package, rely on the index template which comes with Filebeat. It is possible to change the template during or after installation, but it's a painfully frustrating process and I would recommend to anybody to avoid it unless really needed.

So this process assumes you are using the above mentioned configuration.

The easiest way to install the dashboards is by installing filebeat also on one of your Elastic hosts. So go back up and install Filebeat as described here.

After the installation, skip the certificate creations. We will not use Logstash in this case but transfer data directly to Elastic.

Once installed, simply run the following command:

$ sudo filebeat setup --dashboards

That's it.

Now what's important is that when you update filebeat, even in a minor version, you will have to redo this, since otherwise the index templates and dashboards will no longer work properly.

Now head over to Kibana and explore the data.

Conclusion

This was a long one, but now we have a running environment which is able to collect Nginx and Apache log files. This concludes the first tutorial for Elastic. Up next will be a few useful tipps for running ELK. Stay tuned.