Backing up and restoring registry databases
- 1. Overview
- 2. Backup configuration
- 3. Restoring backups
- 4. Continuous recovery on a standby cluster
🌐 This document is available in both English and Ukrainian. Use the language toggle in the top right corner to switch between versions. |
1. Overview
The platform enables reliable backup and restore processes for the registry databases, ensuring data protection against loss, supporting disaster recovery, and maintaining business continuity.
To achieve these capabilities, the platform uses the Postgres Operator by Crunchy Data (PGO) in combination with pgBackRest, an open-source, powerful backup and restore tool for PostgreSQL databases.
PGO simplifies and automates many tasks related to back up and restore, including:
-
Configuring automated backup schedules and retention policies
-
Storing backups across multiple locations: Kubernetes, AWS S3 (and S3-compatible services like MinIO), Google Cloud Storage (GCS), and Azure Blob Storage
-
Executing on-demand backups
-
Supporting Point-In-Time Recovery (PITR)
-
Cloning databases into new instances
2. Backup configuration
- Challenge
-
By default, both the operational and analytical Postgres clusters are configured to continuously archive Write-Ahead Logs (WAL) and create full backups daily. The current retention policy keeps only one full backup—when a new backup is created, pgBackRest automatically deletes the previous one along with its associated WAL files.
- Solution
-
pgBackRest allows configuring various backup types, retention policies, and backup schedules to meet Recovery Point Objective (RPO), Recovery Time Objective (RTO), and efficient storage usage.
2.1. Configuring backup schedules
You can set schedules for the following backup types:
-
full
: A complete backup of the entire Postgres cluster. This is the largest backup type. -
differential
: Backs up changes since the last full backup. -
incremental
: Backs up changes since the last full, differential, or incremental backup.
- Example:
-
To create a full backup every Sunday at 1:00 AM and incremental backups every other day at 1:00 AM:
spec: backups: pgbackrest: repos: - name: repo1 schedules: full: "0 1 * * 0" incremental: "0 1 * * 1-6"
Schedules are defined in the spec.backups.pgbackrest.repos.schedules section using cron format.
|
PGO will generate corresponding Kubernetes CronJobs to automate the backup process.
2.2. Managing backup retention policies
PGO allows flexible control over backup retention policies for full and differential backups, ensuring a balance between historical backup availability and efficient disk usage.
2.2.1. Retention logic
When a full backup expires, pgBackRest automatically deletes it along with its associated incremental and differential backups and corresponding WAL files.
- Example:
-
If you have a full backup with four incremental backups, all incremental backups will be deleted when the full backup expires.
2.2.2. Retention policy types
There are two types of retention policies for full backups (repo1-retention-full-type
):
-
count
— Keeps a specified number of the most recent full backups. (Default value) -
time
— Retains backups for a specified number of days.
2.2.3. Configuration sample
To keep full backups for 14 days, add the following to spec.backups.pgbackrest.global
:
spec:
backups:
pgbackrest:
global:
repo1-retention-full: "14"
repo1-retention-full-type: time
If you have two backups—one 12 days old and another 15 days old—neither will be deleted. Deleting the 15-day-old backup would leave only the 12-day-old backup, violating the policy that requires at least one 14-day-old backup to be retained. |
2.3. Managing WAL archiving
When WAL log volume is significant, you can optimize storage by limiting the number of retained WAL logs without compromising recovery capabilities.
pgBackRest provides the following parameters to manage WAL archiving:
-
repo1-retention-archive
— Number of backups to retain WAL logs for. -
repo1-retention-archive-type
— Backup types for which WAL logs are retained. Supported values: -
full
— Retain WAL only for full backups. -
diff
— Retain WAL for full and differential backups. -
incr
— Retain WAL for full, differential, and incremental backups.
2.3.1. Examples
-
Space optimization: To minimize WAL log retention, configure WAL archiving only for full backups:
repo1-retention-archive: "2" repo1-retention-archive-type: full
-
Flexible recovery: To support Point-In-Time Recovery (PITR), use:
repo1-retention-archive: "5" repo1-retention-archive-type: incr
2.4. One-time backup creation
Postgres Operator (PGO) allows creating one-time backups using pgBackRest. This process involves two main steps:
-
Configuring backup parameters.
-
Initiating the backup through an annotation.
Step 1: Configure backup parameters
First, configure the spec.backups.pgbackrest.manual
section to define the parameters for the one-time backup.
Specify the backup type and any additional pgBackRest options.
spec:
backups:
pgbackrest:
manual:
repoName: repo1 (1)
options:
- --type=full (2)
Parameter explanation:
1 | repoName — The name of the repository where the backup will be stored. |
2 | --type=full — Indicates a full backup. |
This configuration only prepares the backup parameters but does not initiate the backup process. |
Step 2: Launch the backup
To start the backup,
add the annotation postgres-operator.crunchydata.com/pgbackrest-backup
to your PostgresCluster resource.
Include a timestamp in the annotation to indicate when the backup was initiated.
kubectl annotate -n <registry-name> postgrescluster operational \
postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
Replace <registry-name> with your registry/namespace name.
|
PGO will detect the annotation and initiate the one-time backup based on the configured parameters.
Repeating a backup
If you need to rerun the backup with the same parameters, update the annotation with the --overwrite
flag.
kubectl annotate -n <registry-name> postgrescluster operational --overwrite \
postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
Recommendations
-
Save commonly used backup configurations in the
manual
section for easy reuse. -
Use timestamped annotations to track backup initiation times.
-
For large clusters, schedule full backups during off-peak hours to minimize a system load.
3. Restoring backups
3.1. Restore to a specific point in time or backup
pgBackRest supports two main approaches for restoring a PostgreSQL database:
-
Point-in-Time Recovery (PITR) — Restore the database to its exact state at a specific date and time.
-
Restore from a Specific Backup — Restore directly from a chosen backup without applying WAL logs.
3.1.1. Point-in-time recovery (PITR)
To restore the database to a specific point in time:
-
Configure the restore options: Add the following configuration under
spec.backups.pgbackrest
:spec: backups: pgbackrest: restore: enabled: true repoName: repo1 options: - --type=time (1) - --target="2022-06-09 14:15:11" (2)
Parameter explanation:
1 --type=time
— Indicates a point-in-time recovery (PITR).2 --target
— Specifies the target date and time for recovery (YYYY-MM-DD HH:MM:SS
).The restore time must be provided in UTC.
-
Initiate the restore process: Start the recovery by adding an annotation to the PostgresCluster resource:
kubectl annotate -n <registry-name> postgrescluster operational --overwrite \ postgres-operator.crunchydata.com/pgbackrest-restore="$(date)"
Replace
<registry-name>
with your actual registry or namespace name.
|
3.1.2. Restore from a specific backup
This method allows restoring directly from a chosen backup without applying WAL logs.
-
Configure Restore Options: Update the
spec.backups.pgbackrest
section as follows:spec: backups: pgbackrest: restore: enabled: true repoName: repo1 options: - --type=immediate (1) - --set=20220602-073427F_20220602-073507I (2)
Parameter explanation:
1 --type=immediate
— Directly restores from the specified backup.2 --set
— The identifier of the backup you wish to restore. -
Find available backups: To get a list of backups, use one of the following methods:
-
Using pgBackRest CLI:
pgbackrest info --stanza=db
-
Check the contents of the S3 bucket where the backups are stored.
-
-
Initiate the restore process: Start the restore by adding the annotation:
kubectl annotate -n <registry-name> postgrescluster operational --overwrite \ postgres-operator.crunchydata.com/pgbackrest-restore="$(date)"
3.1.3. Completing the restore process
After the restore is complete, disable the restore mode to prevent further unintended restores.
spec:
backups:
pgbackrest:
restore:
enabled: false
3.1.4. Additional notes
|
3.2. Cloning from a backup
pgBackRest allows you to clone a database from an existing backup, enabling the creation of a new PostgreSQL instance either from a specific point in time or from a specific backup set.
3.2.1. Cloning to a specific point in time (PITR)
To create a new PostgreSQL instance restored to a specific point in time,
add the spec.dataSource
section to the manifest of the new database instance.
spec: (1)
dataSource: (2)
pgbackrest: (3)
stanza: db (4)
configuration: (5)
- secret: (6)
name: s3-conf (7)
global: (8)
repo1-path: "/postgres-backup/source_system/operational" (9)
repo1-s3-uri-style: path (10)
repo1-storage-verify-tls: n (11)
repo1-storage-port: "9000" (12)
options: (13)
- --type=time (14)
- --target="2022-06-09 14:15:11-04" (15)
repo: (16)
name: repo1 (17)
s3: (18)
bucket: "bucketName" (19)
endpoint: "endpoint" (20)
region: "us-east-1" (21)
Parameter explanation:
1 | spec — Root section of the Kubernetes manifest. |
2 | dataSource — Specifies the data source for the restore process. |
3 | pgbackrest — Configures pgBackRest as the data source. |
4 | stanza — The pgBackRest configuration name for the database. |
5 | configuration — Contains connection details to access backup storage. |
6 | secret — Kubernetes Secret object for storing credentials. |
7 | name — Name of the secret holding S3 access credentials (s3-conf ). |
8 | global — Global settings for connecting to the backup repository. |
9 | repo1-path — Path to the backup directory in the repository. |
10 | repo1-s3-uri-style — URI style for S3 connections (path or virtual ). |
11 | repo1-storage-verify-tls — Disables TLS verification (n for no). |
12 | repo1-storage-port — Port for connecting to the storage service. |
13 | options — Options passed to pgBackRest during the restore. |
14 | --type=time — Indicates a PITR restore. |
15 | --target — Target date and time for the PITR (YYYY-MM-DD HH:MM:SS ). |
16 | repo — Specifies the backup repository. |
17 | name — Name of the repository (repo1 ). |
18 | s3 — S3 configuration for the backup repository. |
19 | bucket — S3 bucket name where backups are stored. |
20 | endpoint — URL endpoint for the S3-compatible storage. |
21 | region — S3 region (e.g., us-east-1 ). |
The |
3.2.2. Cloning from a specific backup
To restore a new instance from a specific backup (without WAL logs), update the options
section as follows:
options:
- --type=immediate
- --set=20220602-073427F_20220602-073507I
Parameter explanation:
-
--type=immediate
— Restores directly from a specific backup. -
--set
— Identifier of the backup set to be restored.
You can list available backups using:
-
pgBackRest CLI:
pgbackrest info --stanza=db
-
S3 Bucket: Browse the S3 bucket where backups are stored.
3.2.3. Completing the cloning process
Once the manifest is applied and the process begins, PGO will automatically restore the database to a new instance based on the specified parameters.
|
3.2.4. Best practices
-
Use PITR cloning to restore the database to a specific point in time for recovery scenarios.
-
Use immediate restore when you need an exact replica of a specific backup set.
-
Regularly verify your S3 credentials and repository configurations to avoid issues during cloning.
3.3. Synchronizing data on the analytical cluster
- Problem
-
The operational and analytical PostgreSQL databases replicate asynchronously. As a result, their backups might not be consistent, even if both databases are restored to the same point in time. This can lead to discrepancies between the two databases.
- Solution
-
To ensure data consistency between the operational and analytical databases after a restore, you must manually synchronize the analytical cluster with the operational cluster.
3.3.1. Steps to synchronize data
Perform the following steps on the registry
database in the analytical cluster
to synchronize it with the operational database:
-
Disable the current subscription: Temporarily disable the logical replication subscription to prevent new data from syncing during the process.
ALTER SUBSCRIPTION operational_sub DISABLE;
-
Truncate all subscribed tables: Clear all tables that are part of the subscription to ensure no conflicting data remains.
SELECT 'TRUNCATE ' || srrelid::regclass || ' CASCADE;' FROM pg_subscription_rel \gexec
This command generates and executes
TRUNCATE
statements for all subscribed tables, usingCASCADE
to remove dependent data. -
Drop the existing subscription* After truncating the tables, drop the current subscription to prepare for a fresh sync.
DROP SUBSCRIPTION operational_sub;
-
Create a new subscription Re-establish the subscription to start a fresh logical replication process from the operational cluster.
CREATE SUBSCRIPTION operational_sub CONNECTION 'host=OperationalHost user=postgres dbname=registry password=XXXXXX' PUBLICATION analytical_pub WITH (create_slot=false, slot_name=operational_sub);
Parameter explanation:
-
host=OperationalHost
— The hostname or IP of the operational database. -
user=postgres
— The username for the connection. -
dbname=registry
— The name of the database. -
password=XXXXXX
— Password for the connection. -
PUBLICATION analytical_pub
— The publication to subscribe to. -
WITH (create_slot=false, slot_name=operational_sub)
— Uses an existing replication slot.
What happens next?
-
Once the new subscription is created, initial table synchronization begins.
-
After the initial sync is complete, logical replication will automatically start, and the analytical database will stay up to date with the operational cluster.
The time required for the initial synchronization depends on the size of the data in the operational database. |
3.3.2. Recommendations
-
Avoid changes in the analytical database during the synchronization process to prevent data conflicts.
-
Monitor PostgreSQL logs for any errors or warnings during the replication process.
-
Verify data consistency after synchronization by comparing key tables between the operational and analytical databases.
4. Continuous recovery on a standby cluster
Advanced high availability (HA) and disaster recovery (DR) strategies often involve deploying standby clusters that continuously recover from the primary database. These standby clusters act as passive replicas, consistently updated through streaming replication or WAL archiving. In the event of a primary cluster failure, the standby can be promoted to minimize downtime and data loss.
The Postgres Operator (PGO) simplifies the deployment and management of standby clusters across multiple Kubernetes environments while using shared backup repositories like S3-compatible storage.
4.1. Creating a standby cluster
A standby cluster continuously restores from backups and WAL logs from the primary cluster’s repository. To configure a standby cluster, ensure it has access to the primary cluster’s backup repository.
4.1.1. Requirements
-
Access to the backup repository: The standby cluster must connect to the same S3 bucket (or other compatible storage) used by the primary for backups and WAL logs.
-
Configured pgBackRest: The standby cluster uses pgBackRest to continuously restore and apply WAL logs.
4.1.2. Standby cluster configuration
To set up a standby cluster, include the following configuration in your cluster manifest:
spec:
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1 (1)
repos:
- name: repo1 (2)
s3:
bucket: "bucket" (3)
endpoint: "primary.endpoint" (4)
region: "ca-central-1" (5)
standby:
enabled: true (6)
repoName: repo1 (7)
Parameter explanation:
1 | image — The pgBackRest container image used for backup/restore. |
2 | repos — List of backup repositories. |
3 | bucket — Name of the S3 bucket for storing backups and WAL logs. |
4 | endpoint — URL or IP of the S3-compatible storage. |
5 | region — S3 bucket region. |
6 | standby.enabled — Enables standby mode (true ). |
7 | repoName — Name of the repository from which the standby will restore. |
4.2. Promoting the standby cluster
Promoting a standby cluster converts it into a primary cluster capable of accepting read/write operations. This process is crucial during failover scenarios when the original primary is no longer available.
4.2.1. Preventing split-brain scenarios
Before promoting the standby, ensure the original primary cluster is fully offline. Having two active primaries writing to the same data source can result in a split-brain scenario, leading to data inconsistencies or loss. |
4.2.2. Promotion process
-
Disable the standby mode: To promote the standby, disable the standby mode by updating the manifest:
spec: standby: enabled: false
-
After disabling the standby mode, the following happens:
-
The standby stops recovering from the backup repository.
-
It becomes a writable primary cluster.
-
New WAL logs and backups will now be generated from this new primary.
-
4.2.3. Post-promotion steps
-
Verify cluster health: Confirm the newly promoted primary is operational and accepting connections.
-
Reconfigure backup and replication: Ensure that backups are running from the new primary and, if necessary, set up a new standby cluster for HA.
-
Monitor for issues: Check PostgreSQL and PGO logs to verify that the promotion was completed successfully without errors.