How to Monitor a Linux Server: CPU, Memory, and Disk

A server you can’t see is a server you can’t trust. Most incidents — a memory leak, a disk filling up, a process pegging the CPU — are visible in the data well before they cause an outage. The question is whether you’re looking. This guide covers the tools you need, how to read what they’re telling you, and how to set up monitoring that doesn’t require you to check manually.

:::note[TL;DR]

CPU: top, htop, mpstat — look at load average relative to CPU count
Memory: free -h, vmstat — watch for swap usage growing over time
Disk: df -h, du -sh, iotop — full disks kill servers silently
Persistent monitoring: Netdata or Prometheus + Grafana for dashboards and alerts
One-liner health check: uptime && free -h && df -h :::

Prerequisites

Linux server (Ubuntu 22.04/24.04 or Debian 12)
SSH access
Most tools here are pre-installed; some require apt install

How do you check CPU usage?

Quick snapshot with top:

top

Key fields to read:

load average: 0.5, 1.2, 0.8 — CPU load over 1, 5, and 15 minutes. A healthy server has a load average below its CPU core count. A 4-core server with a load of 6.0 is overloaded.
%Cpu(s): 25.0 us, 5.0 sy, 0.0 ni, 68.0 id — user, system, nice, idle percentages. id (idle) below 20% means the CPU is under heavy load.
wa (I/O wait) — if this is high (>10%), a slow disk is blocking processes, not the CPU itself.

Press 1 in top to see per-core stats. Press P to sort by CPU usage. Press q to quit.

Better: htop

sudo apt install htop -y
htop

htop is top with color, mouse support, and easier navigation. F6 sorts by column. F9 sends signals to processes. F10 quits.

CPU stats per core over time:

mpstat -P ALL 2 5
# CPU stats for all cores, every 2 seconds, 5 samples

Find what’s eating CPU right now:

# Sort processes by CPU, show top 10
ps aux --sort=-%cpu | head -10

How do you check memory usage?

Quick overview:

free -h

Output:

              total        used        free      shared  buff/cache   available
Mem:           7.7G        2.1G        1.2G        256M        4.4G        5.1G
Swap:          2.0G          0B        2.0G

The available column is what matters — it includes memory that can be reclaimed from buff/cache. free memory alone is misleading because Linux uses spare memory for caching disk reads; that cached memory is available to applications on demand.

Concerning signs:

available approaching zero
Swap used growing over time (swap use itself isn’t bad; growing swap use means you’re leaking memory)
available dropping after a free value is already low

Watch memory over time:

watch -n 2 free -h
# Refreshes every 2 seconds

Find which processes are using the most memory:

ps aux --sort=-%mem | head -10

Detailed memory breakdown:

cat /proc/meminfo

Key fields: MemAvailable, SwapTotal, SwapFree, Cached, Buffers.

How do you check disk usage?

Disk space by filesystem:

df -h

Look for any filesystem near 100%. A full / will crash the server. A full /var often kills logging and databases. Set up an alert when any filesystem hits 80%.

What’s using the space:

# Top-level directories in /var
sudo du -sh /var/*  | sort -h

# Drill down into a specific directory
sudo du -sh /var/log/*  | sort -h

# Find largest files anywhere on the system
sudo find / -type f -size +100M 2>/dev/null | sort -n

Common culprits on production servers:

/var/log — logs that aren’t rotating
Docker storage at /var/lib/docker — old images and stopped containers
Database data directories
Application caches and temp files

Clean up Docker storage:

# Show Docker disk usage
docker system df

# Remove stopped containers, unused images, orphan volumes
docker system prune -f

# Also remove volumes (destructive — check first)
docker system prune -f --volumes

Disk I/O — is the disk the bottleneck?

sudo apt install iotop -y
sudo iotop -o
# Shows only processes doing active I/O

In top, wa (I/O wait) above 5-10% is a signal that processes are waiting on disk reads or writes.

How do you check network usage?

Active connections and listening ports:

ss -tlnp        # TCP listening sockets with process names
ss -s           # Summary stats

Network traffic in real time:

sudo apt install nload -y
nload eth0

Per-connection bandwidth:

sudo apt install nethogs -y
sudo nethogs eth0

How do you check what’s running?

All running services (systemd):

systemctl list-units --type=service --state=running

Recent service failures:

journalctl -p err -n 50
# Last 50 error-level log entries across all services

Specific service logs:

journalctl -u nginx -n 100 -f
# Follow nginx logs in real time

System uptime and load:

uptime
# 10:23:45 up 42 days,  3:12,  2 users,  load average: 0.45, 0.38, 0.32

How do you set up persistent monitoring?

Checking tools manually doesn’t catch problems that happen at 3am. You need something that runs continuously and alerts you.

Option 1: Netdata — zero config, beautiful dashboards, alerting built in

bash <(curl -Ss https://my-netdata.io/kickstart.sh)

After installation, the dashboard is at http://your-server-ip:19999. Netdata comes with pre-configured alerts for CPU, memory, disk, and dozens of other metrics. No manual configuration required to get useful monitoring.

Option 2: Prometheus + Grafana — more complex, more powerful

Install node_exporter on each server to expose metrics:

# Download and install node_exporter
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-linux-amd64.tar.gz
tar xzf node_exporter-linux-amd64.tar.gz
sudo mv node_exporter-*/node_exporter /usr/local/bin/

# Run as a service
sudo useradd -rs /bin/false node_exporter
sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<EOF
[Unit]
Description=Node Exporter

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=default.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

Metrics are now at http://your-server-ip:9100/metrics. Point Prometheus at that endpoint, then build Grafana dashboards from the collected data.

Prometheus + Grafana is the industry standard for self-hosted infrastructure monitoring — Grafana’s node exporter dashboard (ID 1860) gives you CPU, memory, disk, and network in one import.

Option 3: Simple cron-based disk alert

If a full monitoring stack is overkill, a cron job that emails you when disk is over 80% is better than nothing:

crontab -e

Add:

# Check disk at 8am every day
0 8 * * * df -h | awk 'NR>1 {gsub(/%/,""); if ($5 > 80) print "DISK ALERT: "$0}' | mail -s "Disk Alert $(hostname)" you@example.com

One-liner server health check

A quick command to paste when you need a snapshot:

echo "=== UPTIME ===" && uptime && \
echo "=== CPU ===" && mpstat 1 1 2>/dev/null || top -bn1 | grep "Cpu(s)" && \
echo "=== MEMORY ===" && free -h && \
echo "=== DISK ===" && df -h && \
echo "=== TOP PROCESSES ===" && ps aux --sort=-%cpu | head -6

Summary

Load average vs. CPU count tells you if the CPU is under pressure; id (idle) below 20% confirms it
available memory (not free) is the real number to watch; growing swap use means a memory leak
Disk health is critical — df -h for overview, du -sh to drill down; full /var kills logs and databases
iotop for disk I/O bottlenecks; nethogs for per-process network usage
Netdata is the fastest path to persistent monitoring with alerts; Prometheus + Grafana is the production standard

FAQ

My load average is 4.0 but the server feels fine. Is that bad?

It depends on your CPU count. Load average of 4.0 on a 4-core server means it’s running at capacity — no headroom. On an 8-core server, 4.0 means it’s at 50%. Run nproc to see your core count. A good rule of thumb: alert when load average > (CPU cores × 0.8) for more than 5 minutes.

The server is slow but CPU and memory look fine. What else should I check?

I/O wait (wa column in top) is the first thing to check. High I/O wait means processes are blocked waiting for the disk. Run sudo iotop -o to see which process is doing the I/O. After that, check network with ss -s and nethogs. Finally, check database slow query logs — a slow query running in a loop looks like CPU idle but causes all application threads to stack up.

How do I find what deleted file is keeping my disk full?

When a file is deleted while a process still has it open, the space isn’t freed until the process releases it. df shows low free space but du doesn’t account for the space. Find the culprit:

sudo lsof +L1
# Lists all open files with link count 0 (deleted but held open)

The output shows which process is holding the file open. Restarting that process releases the disk space.