When I initially set up my CNPG (Cloud Native PostgreSQL) cluster, I overlooked the option to enable data checksums. Data checksums ensure data integrity in PostgreSQL by allowing the system to detect corruption in data pages. When enabled, a checksum is calculated for each data page and stored alongside it. Upon reading, the checksum is recalculated and compared to the stored value, ensuring corruption can be detected early. This guide will help you enable data checksums on an existing CNPG cluster.
If you plan ahead, you can enable data checksums during the cluster initialization by adding the following to your cluster spec:
spec:
bootstrap:
initdb:
dataChecksums: true
For clusters where this step was missed, PostgreSQL provides the pg_checksums
tool to enable or disable checksums after initialization. However, this is an offline operation, meaning the database must be shut down to complete the process. Given CNPG’s focus on high availability, we need to take each instance offline individually to perform this operation.
Here’s a step-by-step guide to enabling data checksums on an existing CNPG cluster:
-
Check if Checksums are Already Enabled
Run the following SQL command to get the status:
SHOW data_checksums;
If the output is
off
, you need to proceed with enabling the checksums. -
Backup Your Database
Before performing any manual operations on your database, ensure you have a current backup. This step is crucial to prevent data loss in case something goes wrong.
-
Install the CNPG Plugin
Ensure you have the CNPG plugin installed. This plugin helps manage CNPG clusters and perform various operations more efficiently. Refer to the official documentation for installation instructions.
-
Fence the Instance
To safely take a instance offline, we need to fence it. Fencing ensures the instance is isolated and no new connections are made. Use the following command:
kubectl cnpg fencing on --namespace <cluster-namespace> <cluster-name> <cluster-instance>
However this did not work for me, so I used the following annotation on the cluster.
apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: annotations: cnpg.io/fencedInstances: "[\"<cluster-name>-<cluster-instance>\"]"
-
Enable Data Checksums
Once the instance is fenced, exec into the pod and run the
pg_checksums
command to enable checksums. This command must be run while the database is offline:pg_checksums --enable --progress --verbose
After running the
pg_checksums
command, the database will scan all data blocks, and calculate and store checksums. This process can be time-consuming depending on the size of your database.72497/72497 MB (100%) computed Checksum operation completed Files scanned: 9675 Blocks scanned: 9279637 Files written: 0 Blocks written: 0 pg_checksums: syncing data directory pg_checksums: updating control file Data checksum version: 1 Checksums enabled in cluster
-
Unfence the Instance
Now that data checksums are enabled for this instance, you can remove the fence by either removing the annotation or running the following command:
kubectl cnpg fencing off --namespace <cluster-namespace> <cluster-name> <cluster-instance>
-
Repeat for the Remaining Instances
Once the instance has been unfenced, wait for replication to catch up. You can check the replication status with:
kubectl cnpg status --namespace <cluster-namespace> <cluster-name>
Repeat this process for the remaining instances until they all have data checksums enabled.
Refer to the PostgreSQL documentation on data checksums and the pg_checksums
tool for more detailed information on checksums and the enabling process.