Table of Contents
- Introduction to I/O Schedulers
- How I/O Schedulers Work: Core Concepts
- Common Linux I/O Schedulers Explained
- Factors to Consider When Choosing an I/O Scheduler
- How to View and Change the I/O Scheduler
- Performance Tuning Tips
- Conclusion
- References
Introduction to I/O Schedulers
Before diving into specifics, let’s clarify what an I/O scheduler is not. It’s not the storage device itself, nor is it the filesystem (e.g., ext4, XFS). Instead, it sits between the kernel’s block layer (which abstracts storage devices) and the storage hardware, acting as a traffic controller for I/O requests.
Early storage devices (like HDDs) had mechanical parts: a spinning platter and a moving read/write head. Seeking data across the platter (moving the head) was slow, so early I/O schedulers focused on reducing seek time by reordering requests (e.g., sorting them by physical location on the disk).
Modern storage (SSDs, NVMe) has no moving parts, so seek time is negligible. Here, schedulers prioritize low latency and efficient queue management to handle the high throughput of these devices.
Linux has evolved a variety of I/O schedulers, each optimized for different workloads. Let’s first understand the core mechanisms they use.
How I/O Schedulers Work: Core Concepts
To effectively manage I/O requests, schedulers rely on a few key techniques:
1. Request Queuing
I/O requests (reads/writes) are stored in a queue before being sent to the device. Schedulers manage this queue to optimize order and timing.
2. Request Merging
Adjacent requests (e.g., two writes to consecutive sectors) are merged into a single larger request. This reduces overhead and improves throughput (critical for HDDs and SSDs alike).
3. Request Sorting (Elevator Algorithm)
Inspired by elevator behavior, this reorders requests to minimize movement. For HDDs, this means sorting by sector number to reduce seek time. For SSDs, sorting may still help by aligning with internal parallelism.
4. Prioritization
Some requests (e.g., reads from a video player) are more latency-sensitive than others (e.g., background backups). Schedulers prioritize reads over writes or interactive tasks over batch jobs.
5. Single-Queue vs. Multi-Queue (blk-mq)
Older Linux kernels used a single-queue model, where all I/O requests for a device shared one queue. Modern kernels (3.13+) use blk-mq (block multi-queue), which splits requests into multiple queues (one per CPU core). This reduces lock contention and improves parallelism—critical for high-speed storage like NVMe.
Common Linux I/O Schedulers Explained
Linux supports several I/O schedulers, each with unique strengths. Below is a deep dive into the most popular ones:
Noop (No Operation)
Overview: The simplest scheduler—it does almost nothing. It uses a basic FIFO (First-In-First-Out) queue, with minimal merging and no reordering.
Algorithm:
- Requests are processed in the order they arrive.
- Merges adjacent requests but does not sort them.
Pros:
- Extremely low overhead (minimal CPU usage).
- Ideal for storage devices with their own internal schedulers (e.g., SSDs, NVMe, or hardware RAID controllers), which handle request optimization better than the OS.
Cons:
- Poor performance for HDDs (no seek-time optimization).
Best For:
- SSDs/NVMe (since they have no seek time).
- Virtual machines (VMs), where the hypervisor or underlying storage handles scheduling.
- Embedded systems with limited CPU resources.
Deadline
Overview: Designed to prevent request “starvation” by enforcing hard deadlines for reads and writes. It balances throughput and latency.
Algorithm:
- Maintains three queues:
- A sorted read queue (by sector, like an elevator).
- A sorted write queue (by sector).
- An expired queue for requests that missed their deadlines.
- Reads have a default deadline of 500ms; writes have 5000ms (tunable).
- When a deadline is approaching, the scheduler processes the oldest request in the expired queue.
Pros:
- Prevents long delays for critical requests (e.g., database queries).
- Better throughput than Noop for HDDs (due to sorting).
Cons:
- Slightly higher overhead than Noop.
Best For:
- Mixed workloads (reads + writes) where latency matters (e.g., web servers, databases).
- HDDs (balances seek optimization and latency).
CFQ (Completely Fair Queueing)
Overview: Once the default scheduler (pre-2010s), CFQ focuses on fairness by allocating time slices to processes, similar to CPU scheduling.
Algorithm:
- Creates a separate queue for each process (PID).
- Rotates through queues, giving each process a “time slice” to send requests to the device.
- Sorts requests within each process’s queue to optimize for HDDs.
Pros:
- Fairness: Prevents a single process from monopolizing storage (good for multi-user systems).
Cons:
- High overhead (due to per-process queue management).
- Poor performance for SSDs/NVMe (unnecessary sorting and fairness logic).
- Deprecated in some modern kernels (replaced by multi-queue schedulers).
Best For:
- Legacy systems or workloads requiring strict fairness (e.g., shared hosting servers).
BFQ (Budget Fair Queueing)
Overview: A newer scheduler (merged in kernel 4.12) designed for low latency and fairness, especially for interactive tasks (e.g., web browsing, video playback).
Algorithm:
- Like CFQ, BFQ uses per-process queues but allocates “budgets” (number of sectors) instead of time slices.
- Prioritizes latency-sensitive tasks (e.g., reads) and interactive applications.
- Features a “low-latency” mode for desktop use, reducing lag during concurrent I/O.
Pros:
- Excellent for interactive workloads (desktops, laptops).
- Better throughput than CFQ for SSDs.
- Multi-queue support (via blk-mq).
Cons:
- Slightly higher overhead than Deadline or Noop.
Best For:
- Desktop/laptop users (video editing, gaming, web browsing).
- Systems with mixed interactive and background workloads.
Kyber
Overview: Introduced in kernel 4.15, Kyber is a lightweight, low-overhead scheduler optimized for multi-queue (blk-mq) systems and low-latency workloads.
Algorithm:
- Uses two priority classes: “sync” (latency-sensitive, e.g., reads) and “async” (throughput-focused, e.g., writes).
- Dynamically adjusts the number of in-flight requests to balance latency and throughput.
- Minimal sorting; focuses on queue depth management.
Pros:
- Fast and efficient for NVMe/SSDs.
- Low CPU overhead (better than BFQ for servers).
Cons:
- Less configurable than Deadline or BFQ.
Best For:
- High-performance storage (NVMe, enterprise SSDs).
- Server workloads (databases, virtualization) where low latency and throughput are critical.
MQ-Deadline (Multi-Queue Deadline)
Overview: The multi-queue version of the Deadline scheduler, designed for blk-mq systems. It retains Deadline’s core logic but scales better with multi-core CPUs.
Algorithm:
- Splits requests into multiple queues (one per CPU core) to reduce lock contention.
- Enforces read/write deadlines and sorts requests per queue.
Pros:
- Better parallelism than single-queue Deadline.
- Ideal for modern multi-core systems and high-speed storage.
Cons:
- Less fairness than BFQ (may starve low-priority tasks).
Best For:
- Servers with multi-core CPUs and fast storage (e.g., NVMe databases, virtualization hosts).
Factors to Consider When Choosing an I/O Scheduler
Selecting the right scheduler depends on your hardware and workload. Here are key factors to weigh:
1. Storage Type
- HDD: Prioritize schedulers with seek-time optimization (Deadline, MQ-Deadline). Avoid Noop (no sorting).
- SSD/NVMe: Use low-overhead schedulers (Noop, Kyber) or latency-focused ones (BFQ for desktops, MQ-Deadline for servers).
2. Workload Type
- Interactive (Desktop/Laptop): BFQ (low latency for apps like browsers, video players).
- Server (Database/VMs): MQ-Deadline or Kyber (throughput + low latency).
- Batch/Background (Backups): Deadline (balances throughput and fairness).
- Real-Time: Noop (predictable FIFO behavior).
3. Latency vs. Throughput
- Latency-sensitive (e.g., gaming, databases): BFQ, Kyber, or MQ-Deadline.
- Throughput-sensitive (e.g., large file transfers): Deadline or MQ-Deadline.
4. Kernel Version
Some schedulers are kernel-dependent:
- CFQ is deprecated in kernel 5.0+ (replaced by BFQ/Kyber).
- BFQ is available in 4.12+.
- Kyber and MQ-Deadline require 4.15+ and blk-mq support.
How to View and Change the I/O Scheduler
Linux lets you view and modify the I/O scheduler for individual storage devices (e.g., /dev/sda, /dev/nvme0n1). Here’s how:
Step 1: Identify Your Storage Device
List all block devices with:
lsblk
Note the device name (e.g., sda for an HDD, nvme0n1 for NVMe).
Step 2: View the Current Scheduler
Check the active scheduler for a device (e.g., sda):
cat /sys/block/sda/queue/scheduler
Output example (current scheduler in []):
noop [deadline] cfq bfq
Step 3: Change the Scheduler Temporarily
To switch schedulers (e.g., to bfq for sda), write the scheduler name to the scheduler file:
echo bfq | sudo tee /sys/block/sda/queue/scheduler
Note: This resets after a reboot.
Step 4: Change the Scheduler Permanently
To make the change persistent, use one of these methods:
Method 1: GRUB (For All Devices)
Edit /etc/default/grub and add elevator=<scheduler> to GRUB_CMDLINE_LINUX_DEFAULT:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=bfq"
Update GRUB and reboot:
sudo update-grub
sudo reboot
Method 2: Udev Rules (Per-Device)
Create a udev rule to set the scheduler for a specific device (e.g., sda):
sudo nano /etc/udev/rules.d/60-io-scheduler.rules
Add:
ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline"
Reboot or reload udev rules:
sudo udevadm control --reload-rules
sudo udevadm trigger
Performance Tuning Tips
Even with the right scheduler, tuning can further boost performance:
1. Align I/O with SSD Erase Blocks
SSDs perform best when I/O is aligned to their erase block size (typically 128KB–1MB). Use lsblk -o NAME,PHY-SeC to check physical sector size and align partitions accordingly.
2. Adjust Deadline Parameters
Tweak read_expire (default 500ms) and write_expire (default 5000ms) for Deadline/MQ-Deadline:
# Shorten read deadline for faster response (e.g., 250ms)
echo 250 > /sys/block/sda/queue/iosched/read_expire
3. Enable BFQ’s Low-Latency Mode
For desktops, enable BFQ’s low_latency mode:
echo 1 > /sys/block/sda/queue/iosched/low_latency
4. Optimize Queue Depth
For NVMe, increase the queue depth (number of pending requests) to utilize parallelism:
echo 256 > /sys/block/nvme0n1/queue/nr_requests
Conclusion
Linux I/O schedulers are powerful tools for optimizing storage performance, but there’s no “one-size-fits-all” solution. The key is to match the scheduler to your hardware (HDD vs. SSD) and workload (desktop vs. server).
- HDDs: Use Deadline or MQ-Deadline (seek optimization).
- SSDs/NVMe: Noop (low overhead) or Kyber (low latency).
- Desktops: BFQ (interactive responsiveness).
- Servers: MQ-Deadline or Kyber (throughput + parallelism).
Always test changes in a non-critical environment—measure latency and throughput with tools like fio or iostat to validate improvements. With the right scheduler, you can unlock your storage device’s full potential.