Practical Linux Server Review
Purpose
This document is the companion for the SadServers Server Review Scenario , create a server there and try to do the objectives described here on your own first.
This is an objective-based practical challenge; where you can learn or train with specific results in mind (the “why”) rather than focused on the specific “how”. This exercise is also part of the DevOps Upskill Challenge.
Some solutions (specific commands) are proposed for each objective but they are not the only or best commands; other command options are possible. They are not a complete or exhaustive solution.
See the commands below for example (some are getting obsolete or are distro-dependent); you should make your own list and practice them so they become muscle memory.
The commands are divided in three sections:
- In blue, the purpose of the server, what it does (what ports are exposed).
- In red, how saturated or busy its hardware resources are.
- In green, the possible server and application errors.
Objective 1: Characterize the Server
Find what the server does; as in what’s its purpose (e.g, a web server, a database server, a data processing server etc).
Reveal a Solution
One way to find the purpose of a server is to see what services are exposed (i.e. what ports are open with a process serving them)
sudo netstat -tlpn # becoming obsolete
sudo ss -tlpn # more modern (same flags!)
Of course we can use ps
to list all processes as well:
sudo ps auxf # lots of flag options, play with them
From the netstat
command we get:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 601/sshd: /usr/sbin
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 625/postgres
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN 885/haproxy
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 584/python3
tcp6 0 0 :::22 :::* LISTEN 601/sshd: /usr/sbin
tcp6 0 0 :::8080 :::* LISTEN 566/gotty
We can see in the Local Address
column what services are exposed to the outside, on all interfaces (IPv4 address 0.0.0.0
or IPv6 :::
). These services are:
- SSH exposed in its default port :22
- HAProxy (a web load balancer) on port :8000
- An application “Gotty” on port :8080 (this is the terminal to web application so you can run commands from a web interface, you can disregard it).
Besides we have bounded locally (to 127.0.0.1
and thus accessible locally inside the host itself but not exposed to the outside) the processes:
- A python3 application on port :9000
- A Postgres database on its default port :5432
Objective 2: Find the Server Hardware Utilization
Find the hardware saturation for CPU, RAM, disk, network and if there’s a problem in any of the resources.
Reveal a Solution
We can use top
(or the friendlier htop
which is also installed) to find current Memory and CPU utilization.
From htop
or lscpu
we can see there are two processors (two CPUs) and the webapp.py
process takes about 47-48% of all (both) CPUs and postgres
takes about 10-11% of total CPU time.
The command uptime
shows load averages (a measure of how many processes are running or waiting for CPU time). For two processors we can see there’s a high utilization (more than 2.0).
For memory we can use:
free -m
vmstat 2 2
To check if the server has run out of memory (OOM) we can try any of:
sudo dmesg | grep -i "out of memory"
sudo grep -i "out of memory" /var/log/syslog
sudo journalctl | grep -i "out of memory"
sudo grep -E "oom|killed process" /var/log/syslog
(This can also be done as part of the log review process).
We do find that the system went OOM once.
Blocked processes? look for processes in state Z in top.
For disk utilization we can check disk space with: df -h
, we see we only have one disk nvme0n1p1
(we can use lsblk
to get disk details) with is about 37%
full.
For disk I/O (Input/Output) we can run iostat
and look at the Device statistics. We can also use sar -d 1 3
, both commands show there’s some disk utilization.
Using sudo iotop
we can see the applications writing the most to disk is the systemd logging (systemd-journald
) and the Postgres database (its WAL writer). Other processes (like the web app or haproxy) don’t write directly to disk.
Going back to the top
or htop
screens, if there was any process in a state (“S” column) D , that would mean it’s in Uninterruptible Sleep state, typically waiting for I/O operations to complete, such as disk or network. We don’t see processes in that D state so even if the disks are used, they are not currently a bottleneck for the processes.
We can get some network statistics with sar -n DEV 1
for example.
Objective 3: Explore the Applications
Figure out what the services do and how they are connected to each other.
Reveal a Solution
HAProxy
We can check with curl http://localhost:8000
, which returns a number.
With ps auxf
we see that haproxy is spawn from containerd.
We verify the HAProxy Docker container and look at its properties with:
docker ps -a
docker inspect haproxy
From the inspect command we see a configuration file mounted at /home/admin/haproxy.cfg
, reviewing it we see that haproxy at port :8000 is a web front-end and passes the HTTP requests to an application at :9000 (the python3 application we found from netstat).
Webapp
From netstat we can find the process ID and check with ps, for ex if the PID is 580 we can do: (the [] is a trick to avoid returning the grep command itself)
ps auxf|grep [5]80
We can see the command is: /usr/bin/python3 /home/admin/webapp.py
So we can inspect the code at /home/admin/webapp.py , we see the script connects to the postgres database and returns the number of requests after increasing it.
This script runs like a service or daemon (in the background continuously) and it reads the postgres login credentials from an .env file in the same directory.
We can connect to it directly (bypassing HAProxy): curl http://localhost:9000
Postgres
There’s a PostgreSQL service running on port :5432.
To review the existing databases we can do:
sudo su
su - postgres
psql -l
We can see that besides the default postgres database and templates, there’s the webapp
database referenced in the Python webapp.py
file. We can connect to it (from previous code, as postgres
user) and explore it:
psql -d webapp
\d+
\d traffic
\q
Systemd
We saw that HAProxy runs from a Docker container, so Docker manages it (will restart upon reboot of the server for example), what about the web app and Postgres? Probably using Systemd, and in any case it’s always a good idea to check all services active via Systemd:
sudo systemctl list-units
sudo systemctl list-unit-files
Other
There is an application wrk
(managed by Systemd), this is a stress tool that creates the requests to HAProxy.
Architecture Summary
The whole architecture is then wrk requests -> HAProxy (in Docker) -> web app (Python application managed by Systemd) -> Postgres (database managed by Systemd)
Objective 4: Review OS and Application Logs
Look at Linux system logs and at the logs of the services running to check if there are any critical issues.
Reveal a Solution
We can check Systemd (journalctl) logs:
journalctl -p err # filter for error logs
journalctl -u webapp # logs for the unit (service) webapp
journalctl -u postgresql # postgres logs
For HAProxy, which is Dockerized: docker logs haproxy
We can also inspect with less
or tail
system logs at /var/log
:
less /var/log/messages
less /var/log/syslog
less /var/log/secure
The logs reveal as main issue the Out-of-memory event (OOM).