Practical Linux Server Review

Purpose

This document is the companion for the SadServers Server Review Scenario , create a server there and try to do the objectives described here on your own first.

This is an objective-based practical challenge; where you can learn or train with specific results in mind (the “why”) rather than focused on the specific “how”. This exercise is also part of the DevOps Upskill Challenge.

Some solutions (specific commands) are proposed for each objective but they are not the only or best commands; other command options are possible. They are not a complete or exhaustive solution.

See the commands below for example (some are getting obsolete or are distro-dependent); you should make your own list and practice them so they become muscle memory.

The commands are divided in three sections:

In blue, the purpose of the server, what it does (what ports are exposed).
In red, how saturated or busy its hardware resources are.
In green, the possible server and application errors.

one minute linux server review

Objective 1: Characterize the Server

Find what the server does; as in what’s its purpose (e.g, a web server, a database server, a data processing server etc).

Reveal a Solution

One way to find the purpose of a server is to see what services are exposed (i.e. what ports are open with a process serving them)

sudo netstat -tlpn  # becoming obsolete
sudo ss -tlpn  # more modern (same flags!)

Of course we can use ps to list all processes as well:

sudo ps auxf  # lots of flag options, play with them

From the netstat command we get:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      601/sshd: /usr/sbin 
tcp        0      0 127.0.0.1:5432          0.0.0.0:*               LISTEN      625/postgres        
tcp        0      0 0.0.0.0:8000            0.0.0.0:*               LISTEN      885/haproxy         
tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      584/python3         
tcp6       0      0 :::22                   :::*                    LISTEN      601/sshd: /usr/sbin
tcp6       0      0 :::8080                 :::*                    LISTEN      566/gotty

We can see in the Local Address column what services are exposed to the outside, on all interfaces (IPv4 address 0.0.0.0 or IPv6 :::). These services are:

SSH exposed in its default port :22
HAProxy (a web load balancer) on port :8000
An application “Gotty” on port :8080 (this is the terminal to web application so you can run commands from a web interface, you can disregard it).

Besides we have bounded locally (to 127.0.0.1 and thus accessible locally inside the host itself but not exposed to the outside) the processes:

A python3 application on port :9000
A Postgres database on its default port :5432

Objective 2: Find the Server Hardware Utilization

Find the hardware saturation for CPU, RAM, disk, network and if there’s a problem in any of the resources.

Reveal a Solution

We can use top (or the friendlier htop which is also installed) to find current Memory and CPU utilization.

From htop or lscpu we can see there are two processors (two CPUs) and the webapp.py process takes about 47-48% of all (both) CPUs and postgres takes about 10-11% of total CPU time.

The command uptime shows load averages (a measure of how many processes are running or waiting for CPU time). For two processors we can see there’s a high utilization (more than 2.0).

For memory we can use:

free -m
vmstat 2 2

To check if the server has run out of memory (OOM) we can try any of:

sudo dmesg | grep -i "out of memory"
sudo grep -i "out of memory" /var/log/syslog
sudo journalctl | grep -i "out of memory"
sudo grep -E "oom|killed process" /var/log/syslog

(This can also be done as part of the log review process).

We do find that the system went OOM once.

Blocked processes? look for processes in state Z in top.

For disk utilization we can check disk space with: df -h, we see we only have one disk nvme0n1p1 (we can use lsblk to get disk details) with is about 37% full.

For disk I/O (Input/Output) we can run iostat and look at the Device statistics. We can also use sar -d 1 3, both commands show there’s some disk utilization.

Using sudo /usr/sbin/iotop we can see the applications writing the most to disk is the systemd logging (systemd-journald) and the Postgres database (its WAL writer). Other processes (like the web app or haproxy) don’t write directly to disk.

Going back to the top or htop screens, if there was any process in a state (“S” column) D , that would mean it’s in Uninterruptible Sleep state, typically waiting for I/O operations to complete, such as disk or network. We don’t see processes in that D state so even if the disks are used, they are not currently a bottleneck for the processes.

We can get some network statistics with sar -n DEV 1 for example.

Objective 3: Explore the Applications

Figure out what the services do and how they are connected to each other.

Reveal a Solution

HAProxy

We can check with curl http://localhost:8000, which returns a number.

With ps auxf we see that haproxy is spawn from containerd.

We verify the HAProxy Docker container and look at its properties with:

 docker ps -a
 docker inspect haproxy

From the inspect command we see a configuration file mounted at /home/admin/haproxy.cfg , reviewing it we see that haproxy at port :8000 is a web front-end and passes the HTTP requests to an application at :9000 (the python3 application we found from netstat).

Webapp

From netstat we can find the process ID and check with ps, for ex if the PID is 580 we can do: (the [] is a trick to avoid returning the grep command itself) ps auxf|grep [5]80

We can see the command is: /usr/bin/python3 /home/admin/webapp.py

So we can inspect the code at /home/admin/webapp.py , we see the script connects to the postgres database and returns the number of requests after increasing it.

This script runs like a service or daemon (in the background continuously) and it reads the postgres login credentials from an .env file in the same directory.

We can connect to it directly (bypassing HAProxy): curl http://localhost:9000

Postgres

There’s a PostgreSQL service running on port :5432.

To review the existing databases we can do:

sudo su
su - postgres
psql -l

We can see that besides the default postgres database and templates, there’s the webapp database referenced in the Python webapp.py file. We can connect to it (from previous code, as postgres user) and explore it:

psql -d webapp
\d+
\d traffic
\q

Systemd

We saw that HAProxy runs from a Docker container, so Docker manages it (will restart upon reboot of the server for example), what about the web app and Postgres? Probably using Systemd, and in any case it’s always a good idea to check all services active via Systemd:

sudo systemctl list-units
sudo systemctl list-unit-files

Other

There is an application wrk (managed by Systemd), this is a stress tool that creates the requests to HAProxy.

Architecture Summary

The whole architecture is then wrk requests -> HAProxy (in Docker) -> web app (Python application managed by Systemd) -> Postgres (database managed by Systemd)

Objective 4: Review OS and Application Logs

Look at Linux system logs and at the logs of the services running to check if there are any critical issues.

Reveal a Solution

We can check Systemd (journalctl) logs:

journalctl -p err  # filter for error logs
journalctl -u webapp  # logs for the unit (service) webapp
journalctl -u postgresql  # postgres logs

For HAProxy, which is Dockerized: docker logs haproxy

We can also inspect with less or tail system logs at /var/log:

less /var/log/messages
less /var/log/syslog
less /var/log/secure

The logs reveal as main issue the Out-of-memory event (OOM).