thelinuxvault guide

Developing Resilient Bash Automation Scripts for Linux

Bash scripting is a cornerstone of Linux automation, powering tasks from simple file backups to complex system orchestration. However, many scripts fail silently, break on edge cases, or leave systems in inconsistent states when faced with unexpected inputs, missing dependencies, or runtime errors. **Resilient bash scripts** are designed to handle these challenges gracefully: they validate inputs, recover from failures, log actions for debugging, and produce consistent results even when run multiple times. In this guide, we’ll explore the principles, techniques, and tools to build robust bash scripts that you can trust in production. Whether you’re automating server maintenance, deploying applications, or managing data pipelines, these practices will elevate your scripts from "works most of the time" to "reliable under pressure."

Table of Contents

  1. Understanding Resilience in Bash Scripts
  2. Key Principles for Resilient Scripts
  3. Error Handling: Anticipate and Recover
  4. Input Validation: Guard Against Bad Data
  5. Environment Management: Control Your Context
  6. Logging and Debugging: Illuminate the Black Box
  7. Idempotency: Run Safely, Run Repeatedly
  8. Testing and Validation: Ensure Reliability
  9. Advanced Resilience Techniques
  10. Real-World Example Script
  11. Conclusion
  12. References

1. Understanding Resilience in Bash Scripts

Resilience in bash scripting refers to a script’s ability to:

  • Handle errors gracefully (e.g., missing files, failed commands) instead of crashing.
  • Adapt to changing environments (e.g., different Linux distributions, varying user permissions).
  • Produce consistent results when run multiple times (idempotency).
  • Simplify debugging through clear logging and error messages.

A fragile script might fail with a cryptic command not found error, overwrite critical files, or leave temporary files behind. A resilient script, by contrast, would check for dependencies upfront, validate inputs, and clean up resources even if interrupted.

2. Key Principles for Resilient Scripts

Before diving into techniques, adopt these foundational principles:

  • Defensive Programming: Assume the worst. Validate inputs, check for failures, and never trust external data (e.g., user input, file contents).
  • Least Privilege: Run scripts with the minimum permissions required. Avoid sudo unless absolutely necessary, and restrict write access to critical directories.
  • Transparency: Log actions and errors so you can trace issues later.
  • Simplicity: Keep scripts focused and readable. Complexity increases the chance of bugs.

3. Error Handling: Anticipate and Recover

Bash by default ignores most errors (e.g., a failed cp command won’t stop the script). Use these tools to enforce strict error handling:

3.1 Enable Strict Error Checking

Add these set options at the top of your script (after the shebang) to make bash more strict:

#!/usr/bin/env bash
set -Eeuo pipefail
  • -E: Ensure trap catches errors in subshells and functions.
  • -e: Exit immediately if any command fails (non-zero exit code).
  • -u: Treat unset variables as errors (avoids undefined variable bugs).
  • -o pipefail: Make a pipeline fail if any command in the pipeline fails (by default, only the last command’s exit code matters).

3.2 Custom Error Handling with trap

Use trap to run cleanup code or handle signals (e.g., SIGINT for Ctrl+C). For example, clean up temporary files on exit:

# Define cleanup function
cleanup() {
  echo "Cleaning up temporary files..."
  rm -f /tmp/my_script_temp.txt
  exit 0
}

# Trap exit (0), errors (non-zero), and SIGINT (Ctrl+C)
trap cleanup EXIT ERR SIGINT

3.3 Meaningful Error Messages

Avoid silent failures. Use stderr (file descriptor 2) for error messages to separate them from regular output:

error() {
  echo "ERROR: $*" >&2  # Redirect to stderr
  exit 1
}

# Example: Check if a file exists
file="/critical/data.txt"
[[ -f "$file" ]] || error "File $file not found. Aborting."

4. Input Validation: Guard Against Bad Data

User input, command-line arguments, and external files are common sources of failure. Validate all inputs before acting on them.

4.1 Validate Command-Line Arguments

Check for the correct number of arguments and their validity:

# Check if at least 1 argument is provided
if [[ $# -lt 1 ]]; then
  error "Usage: $0 <input_file>"
fi

input_file="$1"

# Check if input_file is a readable file
if [[ ! -f "$input_file" || ! -r "$input_file" ]]; then
  error "Input file $input_file is not a readable file."
fi

4.2 Sanitize User Input

Reject or sanitize inputs that could cause harm (e.g., path traversal attacks, unexpected special characters). For example, restrict a filename input to alphanumeric characters:

filename="$1"
if [[ ! "$filename" =~ ^[a-zA-Z0-9_]+$ ]]; then
  error "Invalid filename: $filename. Only letters, numbers, and underscores allowed."
fi

4.3 Use getopts for Complex Options

For scripts with flags (e.g., -v for verbose, -o <output>), use getopts to parse arguments cleanly:

output_dir="./output"
verbose=0

# Parse options: -o <dir>, -v
while getopts "o:v" opt; do
  case "$opt" in
    o) output_dir="$OPTARG" ;;
    v) verbose=1 ;;
    \?) error "Invalid option: -$OPTARG" ;;
    :) error "Option -$OPTARG requires an argument." ;;
  esac
done
shift $((OPTIND -1))  # Remove parsed options from $@

5. Environment Management: Control Your Context

Scripts often fail due to environment differences (e.g., missing commands, incorrect PATH, or permission issues). Explicitly manage your environment to avoid surprises.

5.1 Use Absolute Paths

Avoid relying on the PATH variable for critical commands. Use absolute paths to ensure consistency:

# Fragile: Relies on "tar" being in PATH
tar -czf backup.tar.gz ./data

# Resilient: Uses absolute path (verify with `which tar`)
/usr/bin/tar -czf backup.tar.gz ./data

5.2 Check for Dependencies

Ensure required commands (e.g., curl, jq) exist before using them:

# Check if "curl" is installed
if ! command -v curl &> /dev/null; then
  error "curl is required but not installed. Install it with: sudo apt install curl"
fi

5.3 Set a Strict PATH

Avoid unintended command execution by restricting PATH to trusted directories:

# Set PATH explicitly to avoid running untrusted commands
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

5.4 Handle Environment Variables

Explicitly set or validate critical environment variables. For example, require a BACKUP_DIR variable:

# Require BACKUP_DIR to be set
if [[ -z "${BACKUP_DIR:-}" ]]; then
  error "BACKUP_DIR environment variable is not set."
fi

# Validate BACKUP_DIR is a writable directory
if [[ ! -d "$BACKUP_DIR" || ! -w "$BACKUP_DIR" ]]; then
  error "Backup directory $BACKUP_DIR is not a writable directory."
fi

6. Logging and Debugging: Illuminate the Black Box

Without logs, debugging a failed script is guesswork. Implement structured logging to track actions, errors, and runtime context.

6.1 Log to a File with Timestamps

Write logs to a dedicated file with timestamps for traceability:

LOG_FILE="/var/log/my_script.log"

# Log function with timestamp and log level
log() {
  local level="$1"
  local message="$2"
  local timestamp=$(date +"%Y-%m-%d %H:%M:%S")
  echo "[$timestamp] [$level] $message" >> "$LOG_FILE"
}

# Usage
log "INFO" "Starting backup process..."
log "ERROR" "Failed to connect to remote server."

6.2 Add Debugging Support

Let users enable debug mode to see command execution details. Use set -x or a custom debug function:

# Enable debug mode with -d flag
debug=0
if [[ "$1" == "-d" ]]; then
  debug=1
  set -x  # Print commands and their arguments as they execute
  shift
fi

# Alternative: Custom debug function
debug() {
  if [[ $debug -eq 1 ]]; then
    echo "DEBUG: $*" >&2
  fi
}

debug "Input file: $input_file"

7. Idempotency: Run Safely, Run Repeatedly

An idempotent script can be run multiple times without causing unintended side effects (e.g., installing a package twice, creating duplicate files).

7.1 Check Before Acting

Before modifying the system, check if the action is already done:

# Idempotent package installation (Debian/Ubuntu)
install_package() {
  local pkg="$1"
  if ! dpkg -l "$pkg" &> /dev/null; then
    log "INFO" "Installing $pkg..."
    sudo apt-get install -y "$pkg"
  else
    log "INFO" "$pkg is already installed. Skipping."
  fi
}

install_package "nginx"

7.2 Use Temporary Files for Modifications

When editing files, avoid overwriting them directly. Use a temporary file and replace the original only if the edit succeeds:

config_file="/etc/nginx/nginx.conf"

# Create a temp file with sed modifications
temp_file=$(mktemp)
sed 's/worker_processes auto;/worker_processes 4;/' "$config_file" > "$temp_file"

# Replace original only if sed succeeded
if [[ $? -eq 0 ]]; then
  mv "$temp_file" "$config_file"
  log "INFO" "Updated $config_file"
else
  rm "$temp_file"
  error "Failed to modify $config_file"
fi

7.3 Avoid Race Conditions

Use locks to prevent concurrent execution (e.g., two instances of the script modifying the same file):

lock_file="/tmp/my_script.lock"

# Check if lock exists
if [[ -f "$lock_file" ]]; then
  error "Script is already running (lock file exists: $lock_file). Exiting."
fi

# Create lock file
trap 'rm -f "$lock_file"' EXIT  # Clean up lock on exit
touch "$lock_file"

# ... rest of script ...

8. Testing and Validation

Even the best scripts have bugs. Test rigorously to catch issues early.

8.1 Lint with shellcheck

Use shellcheck (a static analysis tool) to catch syntax errors, bad practices, and portability issues:

# Install shellcheck (Debian/Ubuntu)
sudo apt-get install shellcheck

# Lint your script
shellcheck my_script.sh

Example output:

In my_script.sh line 5:
if [ $# -lt 1 ]
     ^-- SC2086: Double quote to prevent globbing and word splitting.

8.2 Unit Testing with bats-core

Write unit tests for critical functions using bats-core (Bash Automated Testing System):

# Install bats-core
git clone https://github.com/bats-core/bats-core.git
cd bats-core && sudo ./install.sh /usr/local

# Example test file (my_script.bats)
@test "Script exits with error if no arguments" {
  run ./my_script.sh
  [ "$status" -eq 1 ]
  [ "$output" = "Usage: ./my_script.sh <input_file>" ]
}

Run tests with:

bats my_script.bats

8.3 Dry Runs

Test scripts without making actual changes using “dry run” modes. For example, prefix destructive commands with echo to preview actions:

dry_run=0
if [[ "$1" == "--dry-run" ]]; then
  dry_run=1
  shift
fi

# Dry run: echo the command instead of running it
if [[ $dry_run -eq 1 ]]; then
  echo "Would delete: $old_file"
else
  rm "$old_file"
fi

9. Advanced Resilience Techniques

9.1 Retry Transient Failures

For temporary issues (e.g., network timeouts), retry commands with backoff:

retry() {
  local retries=3
  local delay=2
  local cmd="$*"

  for ((i=1; i<=retries; i++)); do
    if eval "$cmd"; then
      return 0
    fi
    log "WARN" "Command failed (attempt $i/$retries). Retrying in $delay seconds..."
    sleep "$delay"
    delay=$((delay * 2))  # Exponential backoff
  done
  error "Command failed after $retries attempts: $cmd"
}

# Usage: Retry wget until it succeeds
retry "wget https://example.com/large_file.tar.gz"

9.2 Handle Signals Gracefully

Use trap to clean up resources when the script is interrupted (e.g., Ctrl+C):

trap 'echo "Script interrupted. Cleaning up..."; rm -f "$temp_file"; exit 1' SIGINT SIGTERM

9.3 Limit Resources

Prevent scripts from consuming excessive CPU/memory with ulimit:

# Limit script to 1GB memory and 60 seconds runtime
ulimit -v 1048576  # 1GB in KB
ulimit -t 60       # 60 seconds CPU time

10. Real-World Example Script

Below is a resilient script that backs up a directory to a remote server, incorporating error handling, logging, idempotency, and input validation.

#!/usr/bin/env bash
set -Eeuo pipefail

# --------------------------
# Configuration
# --------------------------
LOG_FILE="/var/log/backup_script.log"
REMOTE_USER="backupuser"
REMOTE_HOST="backupserver.example.com"
REMOTE_DIR="/backups"
MAX_RETRIES=3
RETRY_DELAY=2

# --------------------------
# Functions
# --------------------------
error() {
  echo "[$(date +"%Y-%m-%d %H:%M:%S")] [ERROR] $*" >&2
  echo "[$(date +"%Y-%m-%d %H:%M:%S")] [ERROR] $*" >> "$LOG_FILE"
  exit 1
}

log() {
  local level="$1"
  local message="$2"
  local timestamp=$(date +"%Y-%m-%d %H:%M:%S")
  echo "[$timestamp] [$level] $message" >> "$LOG_FILE"
  if [[ "$level" == "ERROR" || "$level" == "WARN" ]]; then
    echo "[$timestamp] [$level] $message" >&2
  fi
}

retry() {
  local cmd="$*"
  local attempts=0
  while [[ $attempts -lt $MAX_RETRIES ]]; do
    if eval "$cmd"; then
      return 0
    fi
    attempts=$((attempts + 1))
    log "WARN" "Command failed (attempt $attempts/$MAX_RETRIES). Retrying in $RETRY_DELAY seconds..."
    sleep "$RETRY_DELAY"
    RETRY_DELAY=$((RETRY_DELAY * 2))
  done
  error "Command failed after $MAX_RETRIES attempts: $cmd"
}

# --------------------------
# Input Validation
# --------------------------
if [[ $# -ne 1 ]]; then
  error "Usage: $0 <source_dir>"
fi

source_dir="$1"

if [[ ! -d "$source_dir" || ! -r "$source_dir" ]]; then
  error "Source directory $source_dir is not a readable directory."
fi

# --------------------------
# Dependency Check
# --------------------------
if ! command -v rsync &> /dev/null; then
  error "rsync is required but not installed. Install with: sudo apt install rsync"
fi

# --------------------------
# Idempotent Backup
# --------------------------
backup_name="backup_$(date +%Y%m%d_%H%M%S).tar.gz"
remote_path="$REMOTE_USER@$REMOTE_HOST:$REMOTE_DIR/$backup_name"

log "INFO" "Starting backup of $source_dir to $remote_path..."

# Use rsync with compression and retry on failure
retry "rsync -avz --delete "$source_dir"/ "$remote_path""

log "INFO" "Backup completed successfully: $remote_path"

11. Conclusion

Developing resilient bash scripts requires a proactive mindset: anticipate failures, validate inputs, and design for recovery. By combining strict error handling, idempotent actions, detailed logging, and rigorous testing, you can create scripts that reliably automate tasks in even the most unpredictable environments.

Start small: adopt set -Eeuo pipefail, add input validation, and implement logging. Over time, layer in advanced techniques like retries and unit testing. Your future self (and your colleagues) will thank you when troubleshooting a 2 AM production issue!

12. References

Introduction

Bash scripting is the backbone of Linux automation, powering tasks from simple file backups to complex system orchestration. However, even experienced developers often write scripts that break unexpectedly due to unhandled errors, environment inconsistencies, or fragile assumptions. A resilient bash script is one that gracefully handles failures, adapts to changing environments, and produces consistent results—even when run multiple times or interrupted.

This guide will walk you through actionable techniques to build robust, maintainable bash scripts. We’ll cover error handling, input validation, logging, idempotency, testing, and more, with real-world examples to illustrate key concepts. By the end, you’ll be equipped to write scripts that you can trust in production.

Table of Contents

  1. Understanding Resilience in Bash Scripts
  2. Key Principles for Resilient Scripts
  3. Error Handling: Anticipate and Recover
  4. Input Validation: Guard Against Bad Data
  5. Environment Management: Control Your Context
  6. Logging and Debugging: Illuminate the Black Box
  7. Idempotency: Run Safely, Run Repeatedly
  8. Testing and Validation
  9. Advanced Resilience Techniques
  10. Real-World Example Script
  11. Conclusion
  12. References

1. Understanding Resilience in Bash Scripts

Resilience in bash scripting refers to a script’s ability to:

  • Handle errors gracefully (e.g., missing files, failed commands) instead of crashing silently.
  • Adapt to diverse environments (e.g., different Linux distributions, user permissions, or dependency versions).
  • Produce consistent results when run multiple times (idempotency).
  • Simplify debugging through clear logs and actionable error messages.

A fragile script might fail with a cryptic command not found error, overwrite critical files, or leave temporary resources behind. A resilient script, by contrast, validates inputs upfront, checks for dependencies, and cleans up even if interrupted.

2. Key Principles for Resilient Scripts

Adopt these foundational principles to build resilience:

  • Defensive Programming: Assume inputs are invalid, commands will fail, and environments are untrusted. Validate everything.
  • Least Privilege: Run with minimal permissions (avoid sudo unless necessary) to limit damage from failures.
  • Transparency: Log actions and errors to simplify troubleshooting.
  • Simplicity: Keep scripts focused. Complexity increases the risk of bugs.

3. Error Handling: Anticipate and Recover

Bash defaults to ignoring most errors (e.g., a failed cp won’t stop execution). Use these tools to enforce strict error handling.

3.1 Enable Strict Error Checking

Add these set options at the top of your script to make bash unforgiving of mistakes:

#!/usr/bin/env bash
set -Eeuo pipefail
  • -E: Ensure trap catches errors in subshells and functions.
  • -e: Exit immediately if any command fails (non-zero exit code).
  • -u: Treat unset variables as errors (avoids undefined variable bugs).
  • -o pipefail: Make a pipeline fail if any command in it fails (not just the last one).

3.2 Custom Error Handling with trap

Use trap to run cleanup code or handle signals (e.g., Ctrl+C). For example, clean up temporary files on exit:

cleanup() {
  echo "Cleaning up temporary files..." >&2
  rm -f "/tmp/my_script_temp.txt"
  exit 0
}

# Trigger cleanup on exit, error, or SIGINT (Ctrl+C)
trap cleanup EXIT ERR SIGINT

3.3 Meaningful Error Messages

Avoid silent failures. Use stderr (file descriptor 2) for errors to separate them from regular output:

error() {
  echo "ERROR: $*" >&2  # Redirect to stderr
  exit 1
}

# Example: Validate a file exists
critical_file="/etc/nginx/nginx.conf"
[[ -f "$critical_file" ]] || error "Missing required file: $critical_file"

4. Input Validation: Guard Against Bad Data

User input, command-line arguments, and external files are common failure points. Validate all inputs before acting.

4.1 Validate Command-Line Arguments

Check for the correct number of arguments and their validity:

# Require at least 1 argument
if [[ $# -lt 1 ]]; then
  error "Usage: $0 <input_file>"
fi

input_file="$1"

# Check if input_file is a readable file
if [[ ! -f "$input_file" || ! -r "$input_file" ]]; then
  error "Input file '$input_file' is not a readable file."
fi

4.2 Sanitize User Input

Reject inputs with dangerous characters (e.g., path traversal, special symbols). For example, restrict filenames to alphanumerics:

filename="$1"
if [[ ! "$filename" =~ ^[a-zA-Z0-9_]+$ ]]; then
  error "Invalid filename: '$filename'. Use letters, numbers, and underscores only."
fi

4.3 Use getopts for Complex Options

For scripts with flags (e.g., -v for verbose), use getopts to parse arguments cleanly:

output_dir="./output"
verbose=0

# Parse options: -o <dir> (output), -v (verbose)
while getopts "o:v" opt; do
  case "$opt" in
    o) output_dir="$OPTARG" ;;
    v) verbose=1 ;;
    \?) error "Invalid option: -$OPTARG" ;;
    :) error "Option -$OPTARG requires an argument." ;;
  esac
done
shift $((OPTIND -1))  # Remove parsed options from $@

5. Environment Management: Control Your Context

Scripts often fail due to environment differences (e.g., missing commands, PATH issues). Explicitly manage your environment.

5.1 Use Absolute Paths

Avoid relying on PATH for critical commands. Use absolute paths to ensure consistency:

# Fragile: Relies on "tar" being in PATH
tar -czf backup.tar.gz ./data

# Resilient: Uses absolute path (verify with `which tar`)
/usr/bin/tar -czf backup.tar.gz ./data

5.2 Check for Dependencies

Ensure required tools (e.g., curl, jq) exist before using them:

if ! command -v curl &> /dev/null; then
  error "curl is required. Install with: sudo apt install curl"
fi

5.3 Set a Strict PATH

Avoid running untrusted commands by restricting PATH to trusted directories:

export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

6. Logging and Debugging: Illuminate the Black Box

Logs are critical for troubleshooting. Implement structured logging to track actions and errors.

6.1 Log to a File with Timestamps

Write logs to a dedicated file with timestamps:

LOG_FILE="/var/log/my_script.log"

log() {
  local level="$1"
  local message="$2"
  local timestamp=$(date +"%Y-%m-%d %H:%M:%S")
  echo "[$timestamp] [$level] $message" >> "$LOG_FILE"
}

# Usage
log "INFO" "Starting backup..."
log "ERROR" "Failed to connect to server"

6.2 Add Debug Mode

Let users enable debug output with a flag (e.g., -d):

debug=0
if [[ "$1" == "-d" ]]; then
  debug=1
  set -x  # Print commands as they run
  shift
fi

# Custom debug messages
debug() {
  [[ $debug -eq 1 ]] && echo "DEBUG: $*" >&2
}

debug "Input file: $input_file"

7. Idempotency: Run Safely, Run Repeatedly

An idempotent script can run multiple times without side effects (e.g., installing a package twice).

7.1 Check Before Acting

Verify actions are needed before executing them:

# Idempotent package installation (Debian/Ubuntu)
install_package() {
  local pkg="$1"
  if ! dpkg -l "$pkg" &> /dev/null; then
    log "INFO" "Installing $pkg..."
    sudo apt-get install -y "$pkg"
  else
    log "INFO" "$pkg is already installed. Skipping."
  fi
}

install_package "nginx"

7.2 Use Temporary Files for Edits

Avoid overwriting files directly. Use temp files and replace originals only if edits succeed:

config_file="/etc/nginx/nginx.conf"
temp_file=$(mktemp)

# Edit temp file
sed 's/worker_processes auto;/worker_processes 4;/' "$config_file" > "$temp_file"

# Replace original only if sed succeeded
if [[ $? -eq 0 ]]; then
  mv "$temp_file" "$config_file"
else
  rm "$temp_file"
  error "Failed to edit $config_file"
fi

8. Testing and Validation

Test scripts rigorously to catch bugs early.

8.1 Lint with shellcheck

Use shellcheck to detect syntax errors and bad practices:

# Install: sudo apt install shellcheck
shellcheck my_script.sh

8.2 Unit Testing with bats-core

Write unit tests with bats-core (Bash Automated Testing System):

# Install bats-core: https://github.com/bats-core/bats-core
@test "Script exits with error if no arguments" {
  run ./my_script.sh
  [ "$status" -eq 1 ]
  [ "$output" = "Usage: ./my_script.sh <input_file>" ]
}

8.3 Dry Runs

Preview actions without making changes using a --dry-run flag:

dry_run=0
if [[ "$1" == "--dry-run" ]]; then
  dry_run=1
  shift
fi

# Dry run: echo instead of running
[[ $dry_run -eq 1 ]] && echo "Would delete: $file" || rm "$file"

9. Advanced Resilience Techniques

9.1 Retry Transient Failures

Retry flaky commands (e.g., network operations) with backoff:

retry() {
  local retries=3
  local delay=2
  local cmd="$*"

  for ((i=1; i<=retries; i++)); do
    if eval "$cmd"; then return 0; fi
    log "WARN" "Attempt $i failed. Retrying in $delays..."
    sleep "$delay"
    delay=$((delay * 2))  # Exponential backoff
  done
  error "Command failed after $retries attempts: $cmd"
}

# Usage: Retry wget on failure
retry "wget https://example.com/file.tar.gz"

9.2 Prevent Concurrent Runs

Use a lock file to avoid overlapping executions:

lock_file="/tmp/my_script.lock"
if [[ -f "$lock_file" ]]; then
  error "Script is already running (lock file: $lock_file)"
fi
trap 'rm -f "$lock_file"' EXIT  # Clean up lock on exit
touch "$lock_file"

10. Real-World Example Script

This script backs up a directory to a remote server, incorporating resilience best practices:

#!/usr/bin/env bash
set -Eeuo pipefail

LOG_FILE="/var/log/backup_script.log"
source_dir="$1"
remote_host="[email protected]:/backups"

# Error handling
error() { echo "ERROR: $*" >&2; exit 1; }
log() { echo "[$(date +%Y%m%d_%H%M%S)] $*" >> "$LOG_FILE"; }

# Input validation
[[ $# -ne 1 ]] && error "Usage: $0 <source_dir>"
[[ ! -d "$source_dir" ]] && error "Source dir $source_dir not found"

# Dependency check
command -v rsync &> /dev/null || error "rsync required (install with apt install rsync)"

# Idempotent backup with retry
log "Starting backup of $source_dir to $remote_host"
retry() {
  for i in {1..3}; do
    rsync -avz "$source_dir"/ "$remote_host"/ && return 0
    sleep $((i*2)); echo "Retry $i/3..." >&2
  done
  error "rsync failed after 3 attempts"
}
retry

log "Backup completed successfully"

11. Conclusion

Resilient bash scripts are built through defensive coding, rigorous validation, and proactive error handling. Start with strict error checking (set -Eeuo pipefail), add input validation, and implement logging. Over time, layer in idempotency, testing, and advanced techniques like retries.

By following these practices, you’ll create scripts that reliably automate tasks—even in unpredictable environments.

12. References