Table of Contents
- What is
iostat? - Installing
sysstat(Prerequisite) - iostat Command Syntax
- Key iostat Metrics Explained
- Practical iostat Examples
- What is
vmstat? - vmstat Command Syntax
- Key vmstat Metrics Explained
- Practical vmstat Examples
- Combining
iostatandvmstatfor Advanced Analysis - Common Use Cases
- Troubleshooting Tips
- Conclusion
- References
What is iostat?
iostat is a command-line tool that generates reports about CPU utilization and input/output statistics for block devices (e.g., hard drives, SSDs, partitions). It is part of the sysstat package, a collection of system monitoring utilities.
Primary Use Cases:
- Identifying slow or overloaded disks.
- Measuring read/write throughput per device.
- Analyzing I/O queue lengths and service times.
- Correlating disk activity with CPU usage.
Installing sysstat (Prerequisite)
iostat (and sar, another useful tool) is included in the sysstat package. If it’s not pre-installed on your system, install it using your package manager:
Debian/Ubuntu:
sudo apt update && sudo apt install sysstat
RHEL/CentOS/Fedora:
sudo yum install sysstat # For RHEL/CentOS
sudo dnf install sysstat # For Fedora
After installation, verify iostat is available:
iostat --version
iostat Command Syntax
The basic syntax for iostat is:
iostat [options] [device] [interval] [count]
- Options: Customize output (e.g.,
-cfor CPU stats,-dfor disk stats,-xfor extended stats). - Device: Specify a particular block device (e.g.,
sda,nvme0n1) to monitor. Omit to see all devices. - Interval: Time (in seconds) between reports.
- Count: Number of reports to generate. Omit for continuous monitoring.
Key iostat Metrics Explained
iostat outputs two main sections by default: CPU statistics and Device statistics. Use -x for extended disk metrics (highly recommended for deep I/O analysis).
CPU Statistics (from iostat or iostat -c)
| Column | Description | What to Look For |
|---|---|---|
%user | CPU time spent on user-space processes. | High values may indicate application-level bottlenecks. |
%nice | CPU time spent on processes with modified priority (nice). | Usually low; spikes may indicate priority-adjusted tasks. |
%system | CPU time spent on kernel-space processes (system calls, I/O). | High values (>20%) may indicate kernel inefficiencies or excessive syscalls. |
%iowait | CPU time waiting for I/O to complete (idle while waiting). | Critical: High values (>10%) suggest I/O bottlenecks (disk is slow). |
%steal | CPU time stolen by the hypervisor (relevant for VMs). | High values (>5%) may indicate resource contention on the host. |
%idle | CPU time idle (not user, system, or waiting for I/O). | Low values (<10%) suggest CPU saturation. |
Device Statistics (Basic: iostat -d; Extended: iostat -x)
Basic Disk Metrics
| Column | Description |
|---|---|
Device | Name of the block device (e.g., sda, sdb1). |
tps | Transactions per second (reads + writes, merged or not). |
kB_read/s | Kilobytes read per second. |
kB_wrtn/s | Kilobytes written per second. |
kB_read | Total kilobytes read since boot. |
kB_wrtn | Total kilobytes written since boot. |
Extended Disk Metrics (iostat -x)
| Column | Description | What to Look For |
|---|---|---|
rrqm/s | Read requests merged per second (merged to reduce I/O operations). | High values indicate efficient I/O scheduling (good). |
wrqm/s | Write requests merged per second. | Same as above for writes. |
r/s | Read requests per second (after merging). | High values may indicate heavy read workloads. |
w/s | Write requests per second (after merging). | High values may indicate heavy write workloads. |
rkB/s | Kilobytes read per second (same as kB_read/s in basic mode). | Throughput metric; compare to device specs (e.g., 500MB/s for SSD). |
wkB/s | Kilobytes written per second (same as kB_wrtn/s in basic mode). | Same as above for writes. |
avgrq-sz | Average request size (in sectors, 1 sector = 512 bytes). | Large values (>200 sectors) = sequential I/O; small = random I/O. |
avgqu-sz | Average I/O queue length. | Critical: Values >2-3 indicate I/O requests are queuing (disk is slow). |
await | Average time (ms) for I/O requests to complete (queue + service time). | Critical: Values >20ms suggest slow I/O (mechanical disks may be higher). |
r_await | Average time (ms) for read requests to complete. | Isolate read-specific latency. |
w_await | Average time (ms) for write requests to complete. | Isolate write-specific latency. |
svctm | Average service time (ms) per I/O request (deprecated in newer sysstat). | Use await instead; svctm does not account for queueing. |
%util | Percentage of time the device was busy handling I/O requests. | Critical: Values >80% indicate the device is near saturation. |
Practical iostat Examples
1. Basic System Overview
Run iostat without options for a quick snapshot of CPU and disk activity since boot:
iostat
Sample Output:
Linux 5.4.0-100-generic (server) 09/20/2024 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.34 0.01 0.89 0.56 0.00 96.20
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 5.23 45.62 120.34 123456 789012
nvme0n1 0.89 2.10 15.67 45678 98765
Key Takeaway: %iowait is low (0.56%), so I/O is not a bottleneck here. sda has higher throughput than nvme0n1.
2. Monitor Disk I/O Continuously
To track disk activity in real time (every 5 seconds, 10 reports total):
iostat -x 5 10
Focus Areas: Watch %util, await, and avgqu-sz for spikes. If %util >80% and avgqu-sz >3, the disk is likely overloaded.
3. Analyze a Specific Device
Monitor only sda with extended metrics:
iostat -x sda 2
4. CPU-Only Report
Isolate CPU metrics (useful for checking %iowait without disk clutter):
iostat -c 3
What is vmstat?
vmstat (Virtual Memory Statistics) reports on system-wide statistics, including processes, memory, paging, block I/O, traps, and CPU usage. Unlike iostat, it does not focus on per-device disk stats but provides a holistic view of system health—making it ideal for identifying memory or CPU-related I/O issues.
Primary Use Cases:
- Detecting memory pressure (swapping, cache thrashing).
- Monitoring system-wide I/O wait (
wain CPU stats). - Correlating paging activity with disk I/O.
vmstat Command Syntax
The basic syntax for vmstat is:
vmstat [options] [interval] [count]
- Options:
-s(summary of memory stats),-d(disk I/O stats),-t(add timestamp). - Interval: Time (seconds) between reports.
- Count: Number of reports (omit for continuous monitoring).
Key vmstat Metrics Explained
vmstat outputs six columns of metrics. Here’s what each means:
1. Processes (procs)
| Column | Description |
|---|---|
r | Number of processes waiting for run time (CPU). |
b | Number of processes blocked (waiting for I/O, e.g., disk, network). |
What to Look For: A high b value (>2-3) indicates processes are stuck waiting for I/O, pointing to a disk bottleneck.
2. Memory (memory)
| Column | Description |
|---|---|
swpd | Amount of virtual memory (swap) used (in kB). |
free | Free physical memory (kB). |
buff | Memory used for buffers (temporary storage for disk I/O). |
cache | Memory used for page cache (files cached from disk). |
What to Look For:
- High
swpd(>50% of total swap) may indicate memory pressure. - Low
free+ highswpdsuggests the system is swapping, which causes heavy I/O.
3. Swap (swap)
| Column | Description |
|---|---|
si | Swap in (kB/s): Data read from swap to memory (paging in). |
so | Swap out (kB/s): Data written from memory to swap (paging out). |
What to Look For: Sustained si/so >0 indicates active swapping, leading to increased disk I/O.
4. I/O (io)
| Column | Description |
|---|---|
bi | Blocks received from a block device (read; 1 block = 512 bytes). |
bo | Blocks sent to a block device (write; 1 block = 512 bytes). |
What to Look For: bi/bo correlate with disk activity. Spikes here may align with high %iowait in CPU stats.
5. System (system)
| Column | Description |
|---|---|
in | Interrupts per second (including clock interrupts). |
cs | Context switches per second (process/thread switches). |
What to Look For: High cs (>10k/s) may indicate excessive process switching, increasing CPU overhead.
6. CPU (cpu)
| Column | Description | What to Look For |
|---|---|---|
us | Time spent on user-space processes. | High values (>70%) may indicate CPU-bound applications. |
sy | Time spent on kernel-space processes. | High values (>30%) may indicate kernel inefficiencies. |
id | Idle time (not user, system, or waiting for I/O). | Low values (<10%) = CPU saturation. |
wa | Time waiting for I/O (equivalent to %iowait in iostat). | Critical: High wa (>10%) = I/O bottleneck. |
st | Time stolen by the hypervisor (VMs only). | High st (>5%) = resource contention on the host. |
Practical vmstat Examples
1. Basic System Snapshot
Run vmstat for a summary since boot:
vmstat
Sample Output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 1234560 56780 2345670 0 0 45 98 123 456 2 1 96 1 0
Key Takeaway: r (1) = 1 process waiting for CPU. wa (1%) is low. No swapping (si=0, so=0).
2. Monitor in Real Time
Track system activity every 3 seconds:
vmstat 3
Red Flags: If b (blocked processes) >2, wa >15%, and bi/bo spike, I/O is likely causing delays.
3. Memory and Swap Details
Use -s for a detailed memory summary:
vmstat -s
Sample Output:
8192000 total memory
2345000 used memory
1234000 active memory
...
2097152 total swap
0 used swap
2097152 free swap
Key Takeaway: No swap is used here, so memory is sufficient.
4. Disk I/O with -d
View per-disk I/O stats (similar to iostat but less detailed):
vmstat -d
Combining iostat and vmstat for Advanced Analysis
iostat and vmstat complement each other:
vmstathighlights system-wide I/O wait (wa), memory pressure, and swapping.iostatidentifies which specific disk is causing the I/O bottleneck.
Example Workflow: Diagnose High I/O Wait
-
Use
vmstatto detect highwa(I/O wait):vmstat 2If
wa>20%, proceed. -
Use
iostat -xto find the culprit disk:iostat -x 2Look for a device with
%util>90% andawait>50ms. -
Correlate with application logs to see if the busy disk is tied to a specific service (e.g.,
/var/lib/mysqlonsda3).
Common Use Cases
1. Identifying a Slow Disk
iostat -x: Check for high%util,await, andavgqu-szon a device.- Example: A mechanical HDD with
%util=95%,await=80ms, andavgqu-sz=4is likely the bottleneck.
2. Memory Pressure Causing I/O
vmstat: Highswpd,si,so, andwaindicate swapping due to low memory.- Fix: Add more RAM or reduce memory usage (e.g., stop unused services).
3. CPU vs. I/O Bottleneck
vmstat: Ifus + sy >80%andwa <5%, it’s a CPU bottleneck.- If
wa >15%andus + sy <50%, it’s an I/O bottleneck.
4. Monitoring Peak Loads
Run iostat -x 10 and vmstat 10 during peak hours (e.g., 9 AM–5 PM) to baseline normal I/O behavior.
Troubleshooting Tips
- High
%utiliniostat: The disk is busy, but checkavgqu-sz. Ifavgqu-szis low, the workload is well-distributed (e.g., sequential reads on an SSD). - High
awaitbut low%util: Indicates slow I/O due to hardware issues (e.g., a failing disk). - High
wainvmstatbut low%utiliniostat: May signal inefficient I/O (e.g., small, random writes). Useiotopto find processes with high I/O. - Swapping (
si/so >0invmstat): Always investigate memory leaks or insufficient RAM before upgrading storage.
Conclusion
iostat and vmstat are indispensable tools for Linux I/O analysis. iostat excels at per-device disk diagnostics, while vmstat provides a bird’s-eye view of system health, including memory, CPU, and swapping. By combining their insights, you can quickly identify whether slow performance stems from a overloaded disk, memory pressure, or CPU saturation.
Practice using these tools in your environment to build familiarity with “normal” metrics—this will make anomalies (like sudden spikes in %util or wa) much easier to spot. For deeper dives, pair them with iotop (process-level I/O), sar (historical stats), or blktrace (low-level disk tracing).
References
sysstatDocumentation: https://github.com/sysstat/sysstatiostatMan Page:man iostatvmstatMan Page:man vmstat- Linux Performance: https://linuxperformance.io/
- Red Hat System Monitoring Guide: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/