Table of Contents
- Introduction to IOstat
- Installing IOstat
- Basic Syntax and Output Explained
- Key Metrics to Monitor
- Advanced IOstat Usage: Options and Flags
- Real-World Scenarios and Examples
- Tips for Effective I/O Analysis
- Conclusion
- References
Introduction to IOstat
IOstat (Input/Output Statistics) is a command-line utility that reports CPU utilization and disk I/O statistics on Linux systems. Unlike tools like df (which shows disk usage) or fdisk (which manages partitions), iostat focuses on performance metrics—tracking how efficiently the system is reading from and writing to storage devices over time.
Why Use IOstat?
- Troubleshoot Bottlenecks: Identify if slow system performance stems from disk I/O (e.g., a overloaded HDD) or other resources (e.g., CPU, memory).
- Monitor Storage Health: Track trends in disk usage (e.g., increasing write latency) to predict failures or upgrade needs.
- Optimize Workloads: Adjust application configurations (e.g., caching, batch processing) based on I/O patterns (e.g., heavy read vs. write workloads).
IOstat is lightweight and preinstalled on most Linux distributions, making it a go-to tool for both casual monitoring and deep dives.
Installing IOstat
IOstat is part of the sysstat package, which includes other utilities like sar (system activity reporter) and mpstat (CPU monitoring). If iostat isn’t already installed on your system, install sysstat using your distribution’s package manager:
For Debian/Ubuntu:
sudo apt update && sudo apt install sysstat
For RHEL/CentOS/Rocky Linux:
sudo yum install sysstat # RHEL 7/CentOS 7
# OR
sudo dnf install sysstat # RHEL 8+/CentOS 8+/Rocky Linux
For Arch Linux:
sudo pacman -S sysstat
After installation, verify iostat is available:
iostat --version
Output (example):
sysstat 12.5.2
(C) Sebastien Godard (sysstat <at> orange.fr)
Basic Syntax and Output Explained
The basic syntax for iostat is:
iostat [options] [interval] [count]
options: Flags to customize output (e.g.,-xfor extended stats,-kfor KB units).interval: Time (in seconds) between successive reports.count: Number of reports to generate (optional; omit for continuous monitoring).
Example: Basic Usage
To view I/O statistics since the system booted (default behavior):
iostat
Sample Output:
Linux 5.4.0-122-generic (server01) 09/15/2023 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.34 0.01 0.89 1.23 0.00 95.53
Device tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.21 45.67 123.45 1234567 8765432
sdb 0.56 2.34 5.67 89012 45678
Breaking Down the Output
The output has two main sections: CPU Statistics and Device Statistics.
1. CPU Statistics (avg-cpu)
This section reports CPU utilization percentages since the last report (or since boot, for the first run).
| Metric | Description |
|---|---|
%user | Time spent on user-space processes (e.g., applications like nginx). |
%nice | Time spent on user-space processes with elevated priority (via nice). |
%system | Time spent on kernel-space processes (e.g., disk I/O, network operations). |
%iowait | Time CPU idle waiting for I/O operations to complete (critical for I/O analysis). |
%steal | Time CPU idle due to virtualization (e.g., hypervisor “stealing” CPU for other VMs). |
%idle | Time CPU is idle and not waiting for I/O. |
Key Takeaway: A high %iowait (e.g., >20%) often indicates an I/O bottleneck—CPU cores are sitting idle because they’re waiting for the disk to respond.
2. Device Statistics
This section reports I/O activity for each storage device (e.g., sda, sdb).
| Metric | Description |
|---|---|
Device | Name of the storage device (e.g., sda, nvme0n1). |
tps | Transactions per second (number of I/O operations per second). |
Blk_read/s | Blocks read per second (throughput for reads). |
Blk_wrtn/s | Blocks written per second (throughput for writes). |
Blk_read | Total blocks read since boot. |
Blk_wrtn | Total blocks written since boot. |
Note: By default, “blocks” are typically 512 bytes (check with getconf BLOCK_SIZE), but this varies by system. Use -k or -m to force KB/MB units (see Advanced Options).
Key Metrics to Monitor
To diagnose I/O issues, focus on these critical metrics:
1. %iowait (CPU Section)
As mentioned, %iowait measures CPU idle time spent waiting for I/O. A consistently high %iowait (e.g., >10-15%) suggests the system is I/O-bound.
Example: If %iowait is 30%, it means 30% of the time, the CPU is idle because it’s waiting for the disk—not because there’s no work to do.
2. tps (Transactions Per Second)
tps counts the number of I/O operations (reads + writes) per second. High tps (e.g., >1000 on a mechanical HDD) may indicate the disk is struggling to keep up with requests.
3. Blk_read/s and Blk_wrtn/s (Throughput)
These metrics measure read/write throughput (data transferred per second). For example, Blk_read/s = 1000 (with 512-byte blocks) equals 500 KB/s read throughput.
4. %util (Extended Metric)
Available with the -x flag (extended stats), %util shows the percentage of time the device was busy handling I/O requests. A %util near 100% indicates the disk is fully saturated—no time left to process new requests, leading to latency.
Advanced IOstat Usage: Options and Flags
IOstat’s real power lies in its options, which let you filter, format, and deepen your analysis. Below are the most useful flags:
-x: Extended Device Statistics
Adds detailed metrics for each device, including latency and queue depth.
Example:
iostat -x 5 # Report extended stats every 5 seconds
Sample Output:
Device r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 1.23 2.45 45.67 123.45 48.21 0.34 89.12 12.34 98.76 2.10 7.65
Extended Metrics Explained:
| Metric | Description |
|---|---|
r/s, w/s | Reads/writes per second (separate from tps). |
rkB/s, wkB/s | Read/write throughput in KB/s (use -m for MB/s). |
avgrq-sz | Average request size (in sectors/blocks). Larger = more efficient (fewer IOPs). |
avgqu-sz | Average number of I/O requests queued for the device. High = backlog. |
await | Average time (ms) for an I/O request to complete (queue time + service time). |
r_await/w_await | Read/write-specific await (isolate slow reads vs. writes). |
svctm | Average service time (ms) for I/O requests (time the device actively processes the request). |
%util | Percentage of time the device is busy (saturated at ~100%). |
-k or -m: Units in KB/MB
By default, iostat uses “blocks” (varies by system). Use -k for KB or -m for MB to standardize units.
Example:
iostat -k # Show throughput in KB/s
iostat -m # Show throughput in MB/s
-d: Only Device Statistics
Omit the CPU section to focus solely on disk I/O.
Example:
iostat -d sda # Show only stats for device sda
-t: Add Timestamp
Include a timestamp with each report (useful for logging).
Example:
iostat -t 5 # Report every 5 seconds with timestamps
-p [device]: Include Partitions
Show stats for a device and its partitions (e.g., sda1, sda2).
Example:
iostat -p sda # Show sda and its partitions
-c: Only CPU Statistics
Omit device stats to focus on CPU utilization.
Example:
iostat -c 2 # Report CPU stats every 2 seconds
Real-World Scenarios and Examples
Let’s apply iostat to common troubleshooting and monitoring tasks.
Scenario 1: Troubleshooting a Slow Application
A web server running nginx is slow to respond. You suspect an I/O bottleneck.
Step 1: Run iostat with extended stats to check disk health:
iostat -x 5
Key Observations:
%iowaitis 35% (high CPU idle time waiting for I/O).sda %utilis 98% (disk is saturated).awaitis 200ms (average I/O latency is very high).
Conclusion: The disk (sda) is overloaded. Solutions: Add faster storage (SSD), optimize I/O (e.g., enable caching), or reduce load (e.g., move logs to another disk).
Scenario 2: Identifying Read vs. Write Issues
A database server is slow during writes.
Step 1: Use -x to isolate read/write metrics:
iostat -x 5
Key Observations:
w/s(writes per second) is 500,r/sis 10.w_awaitis 300ms (write latency is high),r_awaitis 10ms (reads are fast).
Conclusion: Write operations are causing delays. Check for:
- Slow storage (e.g., mechanical HDD instead of SSD).
- Misconfigured database (e.g., too many small write operations).
Scenario 3: Monitoring After a Storage Upgrade
You upgraded from an HDD to an SSD and want to verify performance improvements.
Step 1: Capture baseline stats before upgrade:
iostat -x 5 10 > pre_upgrade_iostat.txt # 10 reports, 5s apart
Step 2: After upgrade, capture new stats:
iostat -x 5 10 > post_upgrade_iostat.txt
Comparison:
- Pre-upgrade:
%util= 95%,await= 150ms. - Post-upgrade:
%util= 30%,await= 15ms.
Result: SSD reduced latency and device utilization, confirming the upgrade resolved the bottleneck.
Tips for Effective I/O Analysis
- Avoid “Since Boot” Averages: Always use an
interval(e.g.,iostat 5) to get current stats, not averages since boot. - Combine with Other Tools: Use
iotopto identify which process is causing I/O,vmstatfor system-wide memory/swap, orsarfor historical trends. - Watch for
%utilandawait: A%util>80% andawait>20ms often indicate a saturated disk. - Beware of
%iowaitPitfalls: High%iowaitdoesn’t always mean I/O is slow (e.g., if CPU cores are idle but I/O is fast). Correlate with%utilandawait. - Standardize Units: Use
-kor-mto avoid confusion with block sizes.
Conclusion
IOstat is a foundational tool for Linux I/O analysis, offering a balance of simplicity and depth. By mastering its syntax, metrics, and advanced options, you can diagnose bottlenecks, monitor storage health, and optimize system performance. Remember: the key to effective I/O analysis is combining iostat’s insights with context (e.g., application behavior, hardware specs) and other tools like iotop or sar.
Whether you’re a system administrator, developer, or DevOps engineer, iostat should be in your toolkit for keeping Linux systems running smoothly.
References
- Sysstat Official Documentation
- IOstat Man Page
- Linux Performance: IOstat (Brendan Gregg’s Blog)
- Understanding Linux I/O (Kernel Documentation)
- Sysstat Installation Guide (GitHub)