thelinuxvault guide

How to Use iostat and vmstat for Linux I/O Analysis

In the world of Linux system administration, identifying performance bottlenecks is a critical skill. Among the most common culprits of slow system performance are **I/O (Input/Output) issues**—such as slow disk reads/writes, excessive swapping, or misconfigured storage. To diagnose these problems, Linux provides powerful, built-in tools: `iostat` and `vmstat`. `iostat` (I/O statistics) focuses on monitoring disk and CPU utilization, making it ideal for pinpointing per-device I/O bottlenecks. `vmstat` (virtual memory statistics) offers a broader view, reporting on processes, memory, paging, and system-wide I/O, helping you correlate I/O with CPU and memory behavior. In this blog, we’ll dive deep into both tools: how to install them, interpret their outputs, and use them to diagnose real-world I/O issues. By the end, you’ll be equipped to analyze and resolve I/O-related performance problems like a pro.

Table of Contents

  1. What is iostat?
  2. Installing sysstat (Prerequisite)
  3. iostat Command Syntax
  4. Key iostat Metrics Explained
  5. Practical iostat Examples
  6. What is vmstat?
  7. vmstat Command Syntax
  8. Key vmstat Metrics Explained
  9. Practical vmstat Examples
  10. Combining iostat and vmstat for Advanced Analysis
  11. Common Use Cases
  12. Troubleshooting Tips
  13. Conclusion
  14. References

What is iostat?

iostat is a command-line tool that generates reports about CPU utilization and input/output statistics for block devices (e.g., hard drives, SSDs, partitions). It is part of the sysstat package, a collection of system monitoring utilities.

Primary Use Cases:

  • Identifying slow or overloaded disks.
  • Measuring read/write throughput per device.
  • Analyzing I/O queue lengths and service times.
  • Correlating disk activity with CPU usage.

Installing sysstat (Prerequisite)

iostat (and sar, another useful tool) is included in the sysstat package. If it’s not pre-installed on your system, install it using your package manager:

Debian/Ubuntu:

sudo apt update && sudo apt install sysstat  

RHEL/CentOS/Fedora:

sudo yum install sysstat   # For RHEL/CentOS  
sudo dnf install sysstat   # For Fedora  

After installation, verify iostat is available:

iostat --version  

iostat Command Syntax

The basic syntax for iostat is:

iostat [options] [device] [interval] [count]  
  • Options: Customize output (e.g., -c for CPU stats, -d for disk stats, -x for extended stats).
  • Device: Specify a particular block device (e.g., sda, nvme0n1) to monitor. Omit to see all devices.
  • Interval: Time (in seconds) between reports.
  • Count: Number of reports to generate. Omit for continuous monitoring.

Key iostat Metrics Explained

iostat outputs two main sections by default: CPU statistics and Device statistics. Use -x for extended disk metrics (highly recommended for deep I/O analysis).

CPU Statistics (from iostat or iostat -c)

ColumnDescriptionWhat to Look For
%userCPU time spent on user-space processes.High values may indicate application-level bottlenecks.
%niceCPU time spent on processes with modified priority (nice).Usually low; spikes may indicate priority-adjusted tasks.
%systemCPU time spent on kernel-space processes (system calls, I/O).High values (>20%) may indicate kernel inefficiencies or excessive syscalls.
%iowaitCPU time waiting for I/O to complete (idle while waiting).Critical: High values (>10%) suggest I/O bottlenecks (disk is slow).
%stealCPU time stolen by the hypervisor (relevant for VMs).High values (>5%) may indicate resource contention on the host.
%idleCPU time idle (not user, system, or waiting for I/O).Low values (<10%) suggest CPU saturation.

Device Statistics (Basic: iostat -d; Extended: iostat -x)

Basic Disk Metrics

ColumnDescription
DeviceName of the block device (e.g., sda, sdb1).
tpsTransactions per second (reads + writes, merged or not).
kB_read/sKilobytes read per second.
kB_wrtn/sKilobytes written per second.
kB_readTotal kilobytes read since boot.
kB_wrtnTotal kilobytes written since boot.

Extended Disk Metrics (iostat -x)

ColumnDescriptionWhat to Look For
rrqm/sRead requests merged per second (merged to reduce I/O operations).High values indicate efficient I/O scheduling (good).
wrqm/sWrite requests merged per second.Same as above for writes.
r/sRead requests per second (after merging).High values may indicate heavy read workloads.
w/sWrite requests per second (after merging).High values may indicate heavy write workloads.
rkB/sKilobytes read per second (same as kB_read/s in basic mode).Throughput metric; compare to device specs (e.g., 500MB/s for SSD).
wkB/sKilobytes written per second (same as kB_wrtn/s in basic mode).Same as above for writes.
avgrq-szAverage request size (in sectors, 1 sector = 512 bytes).Large values (>200 sectors) = sequential I/O; small = random I/O.
avgqu-szAverage I/O queue length.Critical: Values >2-3 indicate I/O requests are queuing (disk is slow).
awaitAverage time (ms) for I/O requests to complete (queue + service time).Critical: Values >20ms suggest slow I/O (mechanical disks may be higher).
r_awaitAverage time (ms) for read requests to complete.Isolate read-specific latency.
w_awaitAverage time (ms) for write requests to complete.Isolate write-specific latency.
svctmAverage service time (ms) per I/O request (deprecated in newer sysstat).Use await instead; svctm does not account for queueing.
%utilPercentage of time the device was busy handling I/O requests.Critical: Values >80% indicate the device is near saturation.

Practical iostat Examples

1. Basic System Overview

Run iostat without options for a quick snapshot of CPU and disk activity since boot:

iostat  

Sample Output:

Linux 5.4.0-100-generic (server)  09/20/2024  _x86_64_  (8 CPU)  

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
           2.34    0.01    0.89    0.56    0.00   96.20  

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn  
sda               5.23         45.62        120.34     123456     789012  
nvme0n1           0.89          2.10         15.67      45678      98765  

Key Takeaway: %iowait is low (0.56%), so I/O is not a bottleneck here. sda has higher throughput than nvme0n1.

2. Monitor Disk I/O Continuously

To track disk activity in real time (every 5 seconds, 10 reports total):

iostat -x 5 10  

Focus Areas: Watch %util, await, and avgqu-sz for spikes. If %util >80% and avgqu-sz >3, the disk is likely overloaded.

3. Analyze a Specific Device

Monitor only sda with extended metrics:

iostat -x sda 2  

4. CPU-Only Report

Isolate CPU metrics (useful for checking %iowait without disk clutter):

iostat -c 3  

What is vmstat?

vmstat (Virtual Memory Statistics) reports on system-wide statistics, including processes, memory, paging, block I/O, traps, and CPU usage. Unlike iostat, it does not focus on per-device disk stats but provides a holistic view of system health—making it ideal for identifying memory or CPU-related I/O issues.

Primary Use Cases:

  • Detecting memory pressure (swapping, cache thrashing).
  • Monitoring system-wide I/O wait (wa in CPU stats).
  • Correlating paging activity with disk I/O.

vmstat Command Syntax

The basic syntax for vmstat is:

vmstat [options] [interval] [count]  
  • Options: -s (summary of memory stats), -d (disk I/O stats), -t (add timestamp).
  • Interval: Time (seconds) between reports.
  • Count: Number of reports (omit for continuous monitoring).

Key vmstat Metrics Explained

vmstat outputs six columns of metrics. Here’s what each means:

1. Processes (procs)

ColumnDescription
rNumber of processes waiting for run time (CPU).
bNumber of processes blocked (waiting for I/O, e.g., disk, network).

What to Look For: A high b value (>2-3) indicates processes are stuck waiting for I/O, pointing to a disk bottleneck.

2. Memory (memory)

ColumnDescription
swpdAmount of virtual memory (swap) used (in kB).
freeFree physical memory (kB).
buffMemory used for buffers (temporary storage for disk I/O).
cacheMemory used for page cache (files cached from disk).

What to Look For:

  • High swpd (>50% of total swap) may indicate memory pressure.
  • Low free + high swpd suggests the system is swapping, which causes heavy I/O.

3. Swap (swap)

ColumnDescription
siSwap in (kB/s): Data read from swap to memory (paging in).
soSwap out (kB/s): Data written from memory to swap (paging out).

What to Look For: Sustained si/so >0 indicates active swapping, leading to increased disk I/O.

4. I/O (io)

ColumnDescription
biBlocks received from a block device (read; 1 block = 512 bytes).
boBlocks sent to a block device (write; 1 block = 512 bytes).

What to Look For: bi/bo correlate with disk activity. Spikes here may align with high %iowait in CPU stats.

5. System (system)

ColumnDescription
inInterrupts per second (including clock interrupts).
csContext switches per second (process/thread switches).

What to Look For: High cs (>10k/s) may indicate excessive process switching, increasing CPU overhead.

6. CPU (cpu)

ColumnDescriptionWhat to Look For
usTime spent on user-space processes.High values (>70%) may indicate CPU-bound applications.
syTime spent on kernel-space processes.High values (>30%) may indicate kernel inefficiencies.
idIdle time (not user, system, or waiting for I/O).Low values (<10%) = CPU saturation.
waTime waiting for I/O (equivalent to %iowait in iostat).Critical: High wa (>10%) = I/O bottleneck.
stTime stolen by the hypervisor (VMs only).High st (>5%) = resource contention on the host.

Practical vmstat Examples

1. Basic System Snapshot

Run vmstat for a summary since boot:

vmstat  

Sample Output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----  
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st  
 1  0      0 1234560  56780 2345670    0    0    45    98  123  456  2  1 96  1  0  

Key Takeaway: r (1) = 1 process waiting for CPU. wa (1%) is low. No swapping (si=0, so=0).

2. Monitor in Real Time

Track system activity every 3 seconds:

vmstat 3  

Red Flags: If b (blocked processes) >2, wa >15%, and bi/bo spike, I/O is likely causing delays.

3. Memory and Swap Details

Use -s for a detailed memory summary:

vmstat -s  

Sample Output:

      8192000 total memory  
      2345000 used memory  
      1234000 active memory  
      ...  
      2097152 total swap  
          0 used swap  
      2097152 free swap  

Key Takeaway: No swap is used here, so memory is sufficient.

4. Disk I/O with -d

View per-disk I/O stats (similar to iostat but less detailed):

vmstat -d  

Combining iostat and vmstat for Advanced Analysis

iostat and vmstat complement each other:

  • vmstat highlights system-wide I/O wait (wa), memory pressure, and swapping.
  • iostat identifies which specific disk is causing the I/O bottleneck.

Example Workflow: Diagnose High I/O Wait

  1. Use vmstat to detect high wa (I/O wait):

    vmstat 2  

    If wa >20%, proceed.

  2. Use iostat -x to find the culprit disk:

    iostat -x 2  

    Look for a device with %util >90% and await >50ms.

  3. Correlate with application logs to see if the busy disk is tied to a specific service (e.g., /var/lib/mysql on sda3).

Common Use Cases

1. Identifying a Slow Disk

  • iostat -x: Check for high %util, await, and avgqu-sz on a device.
  • Example: A mechanical HDD with %util=95%, await=80ms, and avgqu-sz=4 is likely the bottleneck.

2. Memory Pressure Causing I/O

  • vmstat: High swpd, si, so, and wa indicate swapping due to low memory.
  • Fix: Add more RAM or reduce memory usage (e.g., stop unused services).

3. CPU vs. I/O Bottleneck

  • vmstat: If us + sy >80% and wa <5%, it’s a CPU bottleneck.
  • If wa >15% and us + sy <50%, it’s an I/O bottleneck.

4. Monitoring Peak Loads

Run iostat -x 10 and vmstat 10 during peak hours (e.g., 9 AM–5 PM) to baseline normal I/O behavior.

Troubleshooting Tips

  • High %util in iostat: The disk is busy, but check avgqu-sz. If avgqu-sz is low, the workload is well-distributed (e.g., sequential reads on an SSD).
  • High await but low %util: Indicates slow I/O due to hardware issues (e.g., a failing disk).
  • High wa in vmstat but low %util in iostat: May signal inefficient I/O (e.g., small, random writes). Use iotop to find processes with high I/O.
  • Swapping (si/so >0 in vmstat): Always investigate memory leaks or insufficient RAM before upgrading storage.

Conclusion

iostat and vmstat are indispensable tools for Linux I/O analysis. iostat excels at per-device disk diagnostics, while vmstat provides a bird’s-eye view of system health, including memory, CPU, and swapping. By combining their insights, you can quickly identify whether slow performance stems from a overloaded disk, memory pressure, or CPU saturation.

Practice using these tools in your environment to build familiarity with “normal” metrics—this will make anomalies (like sudden spikes in %util or wa) much easier to spot. For deeper dives, pair them with iotop (process-level I/O), sar (historical stats), or blktrace (low-level disk tracing).

References