Table of Contents
- What is Batch Processing?
- Why Bash for Batch Processing?
- Core Concepts in Bash Batch Processing
- Variables and Quoting
- Loops (For, While, Until)
- Conditionals (If-Else, Case Statements)
- Functions
- Practical Batch Processing Examples
- Example 1: File Management (Renaming, Archiving)
- Example 2: Log Analysis and Reporting
- Example 3: Automated System Backups
- Advanced Techniques
- Scheduling with Cron
- Error Handling and Debugging
- Parallel Processing
- Best Practices for Bash Batch Scripts
- Conclusion
- References
1. What is Batch Processing?
Batch processing is a method of executing a series of non-interactive tasks (called a “batch”) automatically, without user intervention. Unlike interactive processing (e.g., typing commands in a terminal), batch jobs run in the background, often scheduled for off-peak hours, and handle repetitive or resource-intensive tasks.
Key Characteristics of Batch Processing:
- Unattended Execution: Runs without user input once started.
- Repetitive Tasks: Ideal for recurring jobs (e.g., daily backups, weekly reports).
- Resource Efficiency: Can process large datasets or multiple tasks sequentially/parallelly.
- Consistency: Eliminates human error by standardizing workflows.
In Linux, batch processing is typically implemented using shell scripts (Bash, Zsh) and scheduling tools like cron.
2. Why Bash for Batch Processing?
Bash is the de facto standard for Linux automation, and for good reason:
- Ubiquity: Preinstalled on nearly all Linux/Unix systems (no extra setup needed).
- Integration: Seamlessly works with Linux command-line tools (
grep,awk,sed,rsync, etc.). - Scripting Power: Supports variables, loops, conditionals, functions, and error handling.
- Flexibility: Can call other programming languages (Python, Perl) or binaries within scripts.
- Lightweight: Minimal overhead compared to heavyweight automation tools.
While alternatives like Python or Ansible exist, Bash remains unparalleled for simple-to-moderate automation tasks due to its simplicity and direct access to system utilities.
3. Core Concepts in Bash Batch Processing
Before diving into scripts, let’s cover foundational Bash concepts you’ll use in batch processing.
Variables and Quoting
Variables store data for reuse. Use VAR=value to define, and $VAR to access.
# Define a variable
GREETING="Hello, Batch Processing!"
# Access it
echo $GREETING # Output: Hello, Batch Processing!
Quoting prevents word splitting and preserves spaces:
- Double quotes (
" "): Allow variable expansion (e.g.,"$GREETING"). - Single quotes (
' '): Treat everything as literal (e.g.,'$GREETING'outputs$GREETING).
Loops
Loops automate repetitive tasks. Common types:
for Loop: Iterate over a list
# Process all .txt files in a directory
for file in *.txt; do
echo "Processing $file"
# Add logic here (e.g., cat $file, grep "error" $file)
done
while Loop: Run until a condition fails
# Count from 1 to 5
count=1
while [ $count -le 5 ]; do
echo "Count: $count"
count=$((count + 1)) # Increment count
done
Conditionals
Conditionals control script flow based on logic.
if-else Statements
Check conditions with [ ] (test) or [[ ]] (Bash-specific, supports patterns).
file="data.log"
if [ -f "$file" ]; then # -f checks if file exists and is a regular file
echo "$file exists."
elif [ -d "$file" ]; then # -d checks if directory
echo "$file is a directory."
else
echo "$file not found."
fi
case Statement: Match patterns
Useful for multiple condition checks:
day=$(date +%A) # Get current day (e.g., "Monday")
case $day in
Monday|Wednesday|Friday)
echo "Workout day!"
;;
Saturday|Sunday)
echo "Rest day!"
;;
*) # Default case
echo "Regular day."
;;
esac
Functions
Functions modularize code for reusability.
# Define a function to backup a file
backup_file() {
local file=$1 # First argument
if [ -f "$file" ]; then
cp "$file" "$file.bak"
echo "Backed up $file to $file.bak"
else
echo "Error: $file not found"
return 1 # Return non-zero exit code for failure
fi
}
# Call the function
backup_file "important.txt"
4. Practical Batch Processing Examples
Let’s apply the core concepts to real-world scenarios.
Example 1: File Management (Bulk Renaming & Archiving)
Suppose you have hundreds of .jpg photos named DSC_0001.jpg, DSC_0002.jpg, etc., and you want to:
- Rename them to
vacation_001.jpg,vacation_002.jpg, … - Archive the renamed files into a tar.gz.
Script: organize_photos.sh
#!/bin/bash
# Purpose: Rename and archive vacation photos
# Configuration
SOURCE_DIR="./photos" # Directory with raw photos
DEST_DIR="./organized_vacation" # Output directory
PREFIX="vacation" # Rename prefix
ARCHIVE_NAME="vacation_archive.tar.gz"
# Create destination directory if it doesn't exist
mkdir -p "$DEST_DIR"
# Rename files with padded numbers (001, 002, ...)
count=1
for file in "$SOURCE_DIR"/*.jpg; do
# Skip if not a file (e.g., if no .jpg files exist)
[ -f "$file" ] || continue
# Pad count to 3 digits (001 instead of 1)
new_name="${PREFIX}_$(printf "%03d" $count).jpg"
# Copy (or move with 'mv') to destination
cp "$file" "$DEST_DIR/$new_name"
echo "Renamed: $file -> $DEST_DIR/$new_name"
((count++)) # Increment count
done
# Archive the organized photos
tar -czf "$ARCHIVE_NAME" -C "$DEST_DIR" .
echo "Created archive: $ARCHIVE_NAME"
How to Use:
- Save as
organize_photos.sh. - Make executable:
chmod +x organize_photos.sh. - Run:
./organize_photos.sh.
Explanation:
mkdir -p: CreatesDEST_DIRand parent directories if missing.printf "%03d" $count: Pads numbers to 3 digits (e.g.,1→001).tar -czf: Creates a compressed archive (c=create,z=gzip,f=file).
Example 2: Log Analysis and Reporting
Servers generate gigabytes of logs. Let’s automate parsing Apache logs to count 404 errors and generate a daily report.
Sample Apache Log Format (simplified):
192.168.1.1 - - [10/Oct/2023:12:34:56 +0000] "GET /page.html HTTP/1.1" 200 1234
192.168.1.2 - - [10/Oct/2023:12:35:10 +0000] "GET /missing.html HTTP/1.1" 404 567
Script: analyze_apache_logs.sh
#!/bin/bash
# Purpose: Analyze Apache logs for 404 errors and generate a report
# Configuration
LOG_FILE="/var/log/apache2/access.log"
REPORT_DIR="./reports"
TODAY=$(date +%Y-%m-%d) # Current date (e.g., 2023-10-10)
REPORT_FILE="$REPORT_DIR/apache_404_report_$TODAY.txt"
# Create report directory
mkdir -p "$REPORT_DIR"
# Check if log file exists
if [ ! -f "$LOG_FILE" ]; then
echo "Error: Log file $LOG_FILE not found!"
exit 1 # Exit with error code 1
fi
# Extract 404 errors (status code is 7th field in Apache's common log format)
# Use awk to filter lines with " 404 " and extract IP, URL, timestamp
echo "Generating 404 report for $TODAY..."
awk '$9 == 404 {print "IP: " $1 ", Time: " $4 ", URL: " $7}' "$LOG_FILE" > "$REPORT_FILE"
# Count total 404s
TOTAL_404=$(wc -l < "$REPORT_FILE")
# Add summary to the report
echo -e "\nTotal 404 Errors: $TOTAL_404" >> "$REPORT_FILE"
echo "Report generated: $REPORT_FILE"
Key Tools Used:
awk: Powerful text processor;$9 == 404filters lines where the 9th field (status code) is 404.wc -l: Counts lines in the report to get total errors.
Example 3: Automated System Backups
Backups are critical. Let’s create a script to back up /home and /etc to an external drive, with incremental backups (only new/changed files).
Script: system_backup.sh
#!/bin/bash
# Purpose: Incremental backup of /home and /etc using rsync
# Configuration
SOURCE_DIRS="/home /etc" # Directories to back up
DEST="/mnt/external_drive/backups" # Backup destination
DATE=$(date +%Y%m%d) # Current date (e.g., 20231010)
BACKUP_DIR="$DEST/full_$DATE" # Full backup directory
LINK_DEST="$DEST/latest" # Link to previous backup (for incremental)
# Check if destination is mounted
if ! mountpoint -q "$DEST"; then
echo "Error: $DEST is not mounted!"
exit 1
fi
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Use rsync for incremental backup:
# -a: Archive mode (preserve permissions, ownership, etc.)
# -h: Human-readable output
# --link-dest: Hardlink to previous backup (saves space for unchanged files)
rsync -ah --link-dest="$LINK_DEST" $SOURCE_DIRS "$BACKUP_DIR"
# Update "latest" symlink to point to the new backup
ln -snf "$BACKUP_DIR" "$LINK_DEST"
echo "Backup completed successfully. Stored in: $BACKUP_DIR"
How It Works:
rsync --link-dest: Creates hardlinks to files from the previous backup ($LINK_DEST) if they haven’t changed, saving disk space.ln -snf: Updates thelatestsymlink to point to the new backup, making it easy to access the most recent version.
5. Advanced Techniques
Once you master basics, these advanced techniques will elevate your scripts.
Scheduling with Cron
To run batch jobs automatically (e.g., daily backups at 2 AM), use cron, Linux’s job scheduler.
Cron Syntax:
* * * * * command_to_run
| | | | |
| | | | +-- Day of the week (0=Sun, 6=Sat, or 1=Mon, 7=Sun)
| | | +---- Month (1-12)
| | +------ Day of the month (1-31)
| +-------- Hour (0-23)
+---------- Minute (0-59)
Common Special Characters:
*: Every value (e.g.,*in minute → every minute).*/5: Every 5 units (e.g.,*/5 * * * *→ every 5 minutes).3,15: Specific values (e.g.,3,15 * * * *→ at 3 and 15 minutes past the hour).
Example: Schedule the Backup Script Daily at 2 AM
- Edit crontab:
crontab -e(usesudo crontab -efor system-wide jobs). - Add:
0 2 * * * /path/to/system_backup.sh >> /var/log/backup.log 2>&10 2 * * *: Run at 2:00 AM daily.>> /var/log/backup.log 2>&1: Append output/errors to a log file.
Error Handling
Prevent silent failures with robust error handling:
set -e: Exit on Error
Add set -e at the top of your script to exit immediately if any command fails:
#!/bin/bash
set -e # Exit if any command fails
cp file1.txt /nonexistent/dir # Fails → script exits here
echo "This line won't run"
trap: Clean Up on Exit
Use trap to run commands (e.g., clean up temp files) when the script exits:
#!/bin/bash
TMP_FILE=$(mktemp) # Create temp file
# Clean up temp file on exit (normal or error)
trap 'rm -f "$TMP_FILE"; echo "Cleaned up temp file"' EXIT
# Do work with $TMP_FILE...
echo "Data" > "$TMP_FILE"
Parallel Processing
Speed up batch jobs by running tasks in parallel.
xargs -P: Parallelize with xargs
xargs -P N runs up to N processes in parallel.
Example: Resize images in parallel
# Resize all .png images to 50% size, 4 processes at a time
find ./images -name "*.png" | xargs -I {} -P 4 convert {} -resize 50% {}.resized.png
GNU Parallel
For more control, use GNU Parallel (install with sudo apt install parallel):
# Run backup script for 5 servers in parallel
parallel -j 5 ./backup_server.sh {} ::: server1 server2 server3 server4 server5
6. Best Practices for Bash Batch Scripts
- Comment Liberally: Explain why (not just what) the code does.
- Use Variables for Configuration: Avoid hard-coded paths (e.g.,
SOURCE_DIRinstead of./photos). - Validate Inputs: Check if files/directories exist before processing (e.g.,
[ -f "$FILE" ]). - Test with
echo: Addechobefore critical commands (e.g.,echo "rm $FILE") to preview actions. - Avoid Wildcards in
rm/mv: Userm -i(interactive) during testing, orfind ... -deletefor safety. - Sanitize User Input: If accepting arguments, validate them (e.g.,
if [ -z "$1" ]; then echo "Usage: $0 <file>"; exit 1; fi). - Version Control: Store scripts in Git for tracking changes.
7. Conclusion
Batch processing with Bash is a superpower for Linux users and sysadmins. From renaming files to automating backups, Bash scripts turn tedious tasks into one-click (or scheduled) operations. By mastering variables, loops, conditionals, and advanced tools like cron and rsync, you’ll save hours of manual work and reduce errors.
Start small: automate a daily task (e.g., cleaning downloads), then gradually tackle more complex workflows. The more you practice, the more creative and efficient your scripts will become!
8. References
- GNU Bash Manual
- Cron How-To
- rsync Man Page
- GNU Parallel Tutorial
- Book: “Bash Cookbook” by Carl Albing and JP Vossen
Happy scripting! 🚀