Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours scrolling social media and waste money on things we forget, but won’t spend 30 minutes a day earning certifications that can change our lives.
Master in DevOps, SRE, DevSecOps & MLOps by DevOpsSchool!

Learn from Guru Rajesh Kumar and double your salary in just one year.


Get Started Now!

Database Backup & Recovery Plan – A Complete Guide

I’ll give you a full, all-in-one guide for production-grade MySQL (and general relational DB) backups that covers:

  • All common strategies and tools
  • How to combine full, incremental, and binlog-based backups
  • Detecting changes since last backup (down to table/row level, where possible)
  • Restoring easily and reliably
  • Addressing hacking/data corruption recovery scenarios
  • Pros, cons, and best scenarios for each method

This will be relevant for MySQL/MariaDB primarily, but the approaches will map to Postgres, MS SQL, Oracle, etc. I’ll also touch on cloud-native and container-native (K8s) scenarios, and key security best practices.


1. Why “Just Full Dumps” Are Not Enough

  • Full dumps (mysqldump, etc.) are simple and give you point-in-time backups, but:
    • They miss changes since the last dump (can lose days of data).
    • They are SLOW for large DBs.
    • You cannot easily audit changes between dumps.

In production, you need:

  • Frequent, efficient backups (including incremental)
  • Change tracking/auditing
  • Rapid, reliable restores (including point-in-time)
  • Ability to detect or recover from corruption/hacking, ideally restoring to “last good” state
  • Table-level (or even row-level) recovery, if possible

2. All Major Production Database Backup Strategies

A. Logical Backups (Full Dumps)

  • Tool: mysqldump, mysqlpump, mydumper
  • What: Exports data as SQL (CREATE, INSERT statements), easy to move/copy.
  • Pros:
    • Human-readable, portable across versions
    • Easy to script and automate
    • Good for small/medium DBs, or schema-only backups
  • Cons:
    • Slow for large DBs
    • Can miss changes made during dump (unless locking/consistent snapshot)
    • Inefficient for incremental backups
    • No automatic change tracking/audit
  • Best for: Small/medium DBs, migrations, offsite copies
  • Restore: Easy (just import SQL)
  • Tools: mysqldump, mysqlpump, mydumper (faster, parallel)
  • Example: mysqldump -u root -p --single-transaction --all-databases > all_databases.sql

B. Physical (Hot) Backups

  • Tool: Percona XtraBackup, MySQL Enterprise Backup, LVM snapshots, filesystem-level copy with flush
  • What: Copies DB files exactly as on disk, can be done live.
  • Pros:
    • Very fast, even for huge DBs (no export/import step)
    • Supports incremental and differential backups
    • Enables point-in-time recovery when combined with binlogs
  • Cons:
    • More complex to restore (need compatible server version)
    • Can’t restore to a different RDBMS or different major version
    • File-level, so OS/FS-specific
  • Best for: Large production DBs, when uptime is crucial
  • Restore: Fast, but requires stopping server (copy files back, apply logs)
  • Tools:
    • Percona XtraBackup (open source, highly recommended)
    • MySQL Enterprise Backup (paid)
    • LVM/ZFS snapshots (requires filesystem support)
  • Example: xtrabackup --backup --target-dir=/data/backups/2024-07-06 # To prepare and restore, follow XtraBackup docs

C. Binary Log (Binlog) Based Backups

  • Tool: MySQL binary logs, mysqlbinlog
  • What: Binlogs record all changes (insert/update/delete) as they happen
  • Pros:
    • Enables point-in-time recovery (PITR)
    • Can replay all changes up to a precise point (before corruption/hack)
    • Used for replication and auditing
  • Cons:
    • Needs binlog enabled/configured
    • Need full backup to start from, then replay binlogs
    • Can be complex to manage for large change volumes
  • Best for: Recovering to “just before” an incident; critical for audit trails
  • Restore:
    1. Restore last full/incremental backup
    2. Apply binlogs up to desired timestamp/transaction
  • Tools:
    • mysqlbinlog
    • XtraBackup (can help with binlog application)
  • Example: mysqlbinlog --start-datetime="2024-07-05 10:00:00" --stop-datetime="2024-07-06 13:00:00" binlog.000123 | mysql -u root -p

D. Incremental & Differential Backups

  • Tool: Percona XtraBackup, MySQL Enterprise Backup, some cloud-native solutions
  • What: Only backs up changes since last backup (incremental: since last any backup, differential: since last full backup)
  • Pros:
    • Saves time, bandwidth, storage
    • Enables granular restore
  • Cons:
    • Need chain of backups to restore (full + all incrementals since)
    • Complex management for many backups
  • Best for: Large, active DBs where daily full backups are too costly
  • Restore: Restore full, then each incremental in order
  • Tools:
    • XtraBackup incremental mode
  • Example: # Full backup xtrabackup --backup --target-dir=/data/backups/base # Incremental backup xtrabackup --backup --target-dir=/data/backups/inc1 --incremental-basedir=/data/backups/base

E. Snapshots (LVM/ZFS/Cloud)

  • Tool: LVM, ZFS, EBS Snapshots (AWS), GCP PD Snapshots, K8s CSI snapshots
  • What: Filesystem or block-device level snapshot, typically near-instant
  • Pros:
    • Extremely fast (seconds/minutes)
    • Minimal performance hit
    • Great for DR, can be automated and stored offsite
  • Cons:
    • Often not application-aware (must flush/suspend DB writes, or use MySQL FLUSH TABLES)
    • Restores the entire volume (not table-level)
    • Can be expensive for frequent snapshots
  • Best for: Large, cloud-hosted DBs, disaster recovery
  • Restore: Attach volume or mount snapshot, copy back data
  • Tools:
    • lvm snapshot, zfs snapshot, AWS CLI, GCP CLI, K8s VolumeSnapshot CRD
  • Example: lvcreate --snapshot --size 10G --name mysql_snap /dev/vg0/mysql

F. Change Data Capture (CDC)/Audit Logging

  • Tool: Binlogs, MySQL audit plugins, third-party CDC tools (Debezium, Maxwell)
  • What: Continuously records/logs all changes, not just for backup but for auditing or replication
  • Pros:
    • Full change visibility (who/what/when)
    • Real-time integration possible (stream to SIEM, analytics, etc.)
  • Cons:
    • Not a backup itself, but a supplement
    • Storage/management of logs
  • Best for: Security, compliance, forensic investigations
  • Restore: Not for restore, but can reconstruct data or investigate incidents
  • Tools:
    • Debezium, Maxwell, native MySQL Audit plugin

G. Cloud-Native and K8s-Integrated Backups

  • Tool: Velero, cloud provider tools (AWS RDS snapshots, GCP Cloud SQL backups)
  • What: Managed, automated backups with integration to cloud/k8s lifecycle
  • Pros:
    • Highly automated, integrated with orchestration
    • Offsite/region support, point-in-time restore
    • Easy to enable PITR, cross-region
  • Cons:
    • Dependent on cloud provider (lock-in)
    • Not always portable to other clouds/on-prem
  • Best for: Cloud-native, K8s-native, SaaS workloads
  • Restore: Provider-managed, one-click or API-based
  • Tools:
    • Velero, RDS/CloudSQL/Azure backup APIs

H. Rsync/File Copy

  • Tool: rsync, scp, rclone
  • What: Copies DB files directly, may be used with downtime
  • Pros:
    • Simple, can be efficient if DB is not active
  • Cons:
    • Not safe if DB is live (risk of corruption)
    • Not consistent for running databases unless DB is stopped or flushed
  • Best for: Small, non-critical DBs, or as a secondary backup
  • Restore: Manual file copy
  • Tools: rsync, scp
  • Example: rsync -a /var/lib/mysql /mnt/backup/mysql-2024-07-06

3. Detecting and Auditing Changes (Table/Row-Level Audit)

No backup alone will audit all changes. For that, combine with:

  • Binary Logs: Show all modifications; can use mysqlbinlog to analyze changes
  • CDC Tools: Debezium, Maxwell, or Audit Plugins for near-real-time change streaming
  • Audit Tables/Triggers: Custom triggers to log changes (can impact performance)
  • Row-based Replication: Captures what changed, not just SQL statements

To audit what changed since last backup:

  1. Dump binlogs for the period
  2. Parse them for relevant statements
  3. Use tools/scripts to extract DML/DDL changes

4. Recovery from Hacking or Corruption: Best Practice Flow

  1. Identify incident window (when hack/corruption started)
  2. Take snapshot immediately (capture current state for forensics)
  3. Restore last known-good full backup
  4. Apply binlogs up to the point BEFORE hack/corruption
    • If you have audit logs, review for suspicious changes
  5. (Optional) Restore/merge only clean tables/rows from latest to avoid data loss
    • More complex; needs careful scripting and validation

Tip: Keep several layers of backup (full, incremental, binlogs, snapshots). For high-value data, enable CDC/audit, so you can identify exactly what was changed/deleted.


5. How to Set Up a Complete, Production-Grade Backup System

A. Enable Binary Logging

In my.cnf:

[mysqld]
log-bin=mysql-bin
binlog-format=ROW
server-id=1

B. Schedule Regular Full + Incremental + Binlog Backups

  • Weekly full physical (XtraBackup)
  • Daily incremental
  • Continuous binlog archiving
  • Optional: snapshot (cloud/LVM) before dangerous upgrades or major changes
  • Optional: logical dump weekly (mysqldump)

C. Automate and Monitor

  • Use cron jobs, Ansible, or backup orchestrators (e.g. Automysqlbackup, Backup Ninja, Barman for Postgres)
  • Store backups offsite and/or in cloud
  • Regularly test restore (don’t trust untested backups!)
  • Keep backup inventory and hash/validate files

D. Track and Audit Changes

  • Archive and parse binlogs
  • Use audit plugin/CDC tool if needed
  • Log all backup/restore operations (for compliance)

E. Security Best Practices

  • Encrypt backups at rest and in transit
  • Protect access to backup storage
  • Use database users with least privilege for backup tasks

6. Summary Table: Production Backup Methods

MethodBackup TypeProsConsBest ForRestore ComplexityTools
Logical dump (mysqldump)Full/logicalSimple, portable, readableSlow for big DBs, can miss new changesSmall/med, migrationEasymysqldump, mydumper
Physical (XtraBackup)Full/incrFast, reliable, point-in-timeTied to engine/version, more complexLarge, prod, 24×7ModerateXtraBackup, MEB
Binlog-basedPITRPoint-in-time, audit trailNeed binlogs + base, managementAll prod DBsModeratemysqlbinlog
Incremental physicalIncrementalStorage/bandwidth efficientNeeds full+all increments to restoreLarge, activeModerateXtraBackup
Snapshots (LVM/EBS)Volume/blockInstant, disaster recoveryNot app-aware, full volume restoreLarge, cloud, DRModerateLVM, EBS, Velero
Rsync/file copyFile-levelSimple, familiarCorruption risk, not live safeSmall, non-criticalEasyrsync, scp
CDC/AuditAudit/logChange tracking, forensic recoveryNot a backup, for audit onlyCompliance, securityN/ADebezium, audit plugin
Cloud-native backupFull/incrManaged, easy, integrated, automatedCloud lock-in, costs, portabilityCloud, SaaS, K8sVery easyAWS RDS, Velero, etc

7. Which is Best for Which Scenario?

  • Mission-critical, large DB:
    • Physical (XtraBackup) + Binlogs + Incremental + Snapshots
  • Cloud (RDS, CloudSQL):
    • Use provider’s managed backup, plus extra logical export for portability
  • Compliance/Audit-heavy:
    • Binlog archive + CDC/audit plugin
  • Dev/UAT/Small apps:
    • mysqldump/rsync is fine
  • Disaster recovery:
    • Snapshots (LVM/cloud) for instant recovery, plus full backup/restore practice

8. How to Restore: Step-by-Step

A. Full Logical Dump

mysql -u root -p < all_databases.sql
Code language: CSS (css)

B. Physical Backup (XtraBackup)

  1. Stop MySQL
  2. Restore files from backup dir to data dir
  3. Apply logs (if needed)
  4. Start MySQL

C. Binlog Replay (PITR)

  1. Restore full backup (logical or physical)
  2. Get binlogs for target period
  3. Replay with mysqlbinlog (stop before “bad” transaction)

D. Cloud/K8s

  • Use provider’s restore wizard/API
  • For K8s: kubectl apply -f volumesnapshot.yaml, or use Velero

9. Production Checklist

  • Full backup tested restore at least monthly
  • Binlogs archived and tested for PITR
  • Automated monitoring & alerts for backup failures
  • Offsite/cross-region copies
  • Backups encrypted and access controlled
  • Backup retention and cleanup policy
  • Documented restore/runbook for disaster/hack/corruption scenarios
  • Table/row-level audit/logs for sensitive environments

10. Recommended Tools

  • Physical/Incremental: Percona XtraBackup
  • Logical: mysqldump, mysqlpump, mydumper
  • Binlog replay: mysqlbinlog
  • Snapshots: LVM/ZFS, aws ec2 create-snapshot, Velero (K8s)
  • Orchestration: Automysqlbackup, Backup Ninja, Ansible, cron
  • Audit/CDC: Debezium, Maxwell, MySQL Audit Plugin
  • Cloud backup: RDS/CloudSQL/GCP/Azure backup APIs

11. Final Advice: Designing a Bulletproof Backup & Recovery Plan

  • Mix full/incremental/point-in-time backups for max coverage and minimal data loss.
  • ALWAYS test restores! (Automate test restores if possible.)
  • Backups are only as good as your ability to restore — have step-by-step guides/runbooks.
  • Audit for changes: Use binlogs, audit plugins, and regular reports.
  • Protect backups like production data — encrypt, limit access, monitor.
  • Automate and document everything.

12. Sample Architecture for Your Scenario

[MySQL DB] 
   |         |----------------------------|
   |         |                            |
[Full Weekly Physical Backup (XtraBackup)]|---> [Offsite Storage]
   |                                      |
[Daily Incremental (XtraBackup)]          |
   |                                      |
[Continuous Binlog Archive] --------------|
   |
[Audit/CDC Plugin → Change Log]
   |
[LVM or Cloud Snapshot before risky upgrades]

13. Sample Restore Flow (After Hack/Corruption)

  1. Identify last known good backup before hack/corruption
  2. Restore full/incremental backup to new server (safe/isolated env)
  3. Apply binlogs up to just before attack
  4. Review binlogs/audit for unwanted changes
  5. Test, validate, then switch application traffic back

If you need scripts, tool-specific configs, or automation code for any scenario above, let me know your stack (bare metal, cloud, K8s, etc.) and I’ll provide hands-on examples!


TL;DR:

For serious production:
Weekly full + daily incremental + continuous binlog backup + audit/change capture + snapshot before upgrades.

Use XtraBackup for physical/incremental, mysqldump/mydumper for logical, automate and test restores, and use binlogs for point-in-time recovery and change auditing.


Reference

Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments

Certification Courses

DevOpsSchool has introduced a series of professional certification courses designed to enhance your skills and expertise in cutting-edge technologies and methodologies. Whether you are aiming to excel in development, security, or operations, these certifications provide a comprehensive learning experience. Explore the following programs:

DevOps Certification, SRE Certification, and DevSecOps Certification by DevOpsSchool

Explore our DevOps Certification, SRE Certification, and DevSecOps Certification programs at DevOpsSchool. Gain the expertise needed to excel in your career with hands-on training and globally recognized certifications.

0
Would love your thoughts, please comment.x
()
x