Picture: Designed by Freestockcenter
In this tutorial series I will try to give a comprehensive guide on how to get Elasticsearch up and running first on one machine, then in a cluster. We will use Elasticsearch to collect webserver access logs to enable you to analyse web traffic from your web server(s), running Nginx and/or Apache2.
I will also show you a few other examples such as CSV inputs or Twitter feeds and how to make sense of that kind of data in Kibana.
I did split up the tutorial in multiple parts, since it would be too much for for one post. In contrast to the many great tutorials I've found and actually used myself on Elasticsearch, I've changed a few things around and introduced a couple of what I think would be good practices and tweaks to the configuration and setup. I also wanted to make sense of the extensive information on Elastic's documentation web site which can be confusing for a beginner (and advanced user as well) and hopefully will be able to provide some context for specific use cases.
Enough said, let's dive in.
- Part 1 (This Site) - Operating System, Java and Tweaks
- Part 2 - Elasticsearch
- Part 3 - Kibana
- Part 4 - Logstash with Nginx
When I speak of Elasticsearch in the headline, I actually mean an ELK stack. ELK stands for Elastic, Logstash and Kibana, which are the core components of an ELK stack.
When running an ELK stack, you need to consider a few options before diving into the whole thing. Not necessarily for what you will need it, since the idea of ELK is to have a multi-purpose system allowing you to be scaled in a cluster.
For an ELK stack, I use Centos 7. ELK is a typical enterprise software and in the Enterprise world, RedHat is by far the most used Distribution. Centos 7 is pretty much the free version of that, so the choice is easy.
Of course Ubuntu is a good choice, too (as most other distros are), but I did run into a few issues around the elasticsearch deb packages and the Java environment.
I would not recommend to run any Window management unless of course you're using Microsoft Windows, which is out of scope for this tutorial.
No matter where you look and ask, the typical answer for the question of what you need to run ELK, is "it depends" (which is also what you get for Hadoop by the way...).
In this case, for the tutorial, I am using a Virtual Machine hosted with my favorite provider Hetzner which is originally from Germany but has data centers around the world. Of course, Digital Ocean is always a good choice, too. I like both of them since they have a good front-end and decent pricing models. (Disclaimer: I'm not payed by them saying this and if you follow the links above, there's no commission I get. Maybe that will change later, but I will of course state the change).
The machine I use to write this tutorial is a simple 2 virtual core CPU, 4 GByte RAM, 20 TByte SSD setup, which is absolutely enough for this purpose. Once you run a little bit more than just a couple of log files you will need to scale up.
As mentioned above, I will use Centos 7 for this tutorial because overall I think it is the most reasonable choice for my environment. But that might not be true for you, so choose based on your preferences and experiences. If you don't have any, just follow along with Centos.
First, after the basic installation, let's make sure, all security updates are installed. Log in as root and enter:
# yum update -y
You don't have to do this, but it's best practices and I always do it, so at least I gotta show you how:
# adduser [username] # passwd [username] # usermod -aG wheel [username]
I know, some of you can't stand to not use vi, but some of us are just more into nano. So if you like, go ahead and get it:
$ sudo yum install nano -y
Optional: Enable automatic system updates
Next, I do recommend to enable automatic updates. If you don't like to do this, skip this step.
$ sudo yum install yum-cron -y
$ sudo nano /etc/yum/yum-cron.conf
Change the two following parameters - note that those parameters are probably not right next to each other, there should be some comments in between:
update_cmd = security download_updates = yes apply_updates = yes emit_via = email system_name = [choose_your_identifier]
Save and close, and restart yum-cron:
$ sudo systemctl restart yum-cron
lsof comes in handy to check network sockets, so I will use that quite often in this tutorials.
$ sudo yum install lsof -y
Install Oracle Java 8
Based on the Elastic Support Matrix, it seems Oracle Java is the one you want to go with.
First, we should get the rpm. You will have to head over to the Oracle Download site and copy the correct link, which should be the x64 package unless you're still running on a 32bit system.
$ sudo yum install wget -y $ cd ~ $ wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "[link-you-just-copied]" $ sudo yum localinstall jre-8u192-linux-x64.rpm -y
Note that the version
8u192 might change in the future so make sure you type the correct file.
Check the version:
$ java -version java version "1.8.0_192" Java(TM) SE Runtime Environment (build 1.8.0_192-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)
If you have multiple Java installations - which you shouldn't - you can configure the default with this:
$ sudo alternatives --config java
You should set an environment variable since many packages including elasticsearch will use this.
$ sudo sh -c "echo export JAVA_HOME=/usr/java/jre1.8.0_192-amd64/ >> /etc/environment"
Again, make sure you get the right path by actually checking out the path. Then source the file:
$ source /etc/environment
You should have a running system with the most recent security patches and Oracle Java 8 installed