PostgreSQL 17 introduces failover slots that enhance high-availability setups. A replication slot ensures that data remains reliable and consistent between nodes during replication, whereas a failover slot ensures consistency between nodes, specifically during and after a failover.
Failover slots are a powerful feature that ensures logical replication can continue seamlessly, even after a failover to a standby server. Using failover slots allows logical replication slots to be automatically synchronized across primary and standby nodes, significantly reducing downtime and the need for manual intervention during a failover.
This guide will walk you through setting up a high-availability PostgreSQL cluster using the new failover slots feature. By the end, you’ll have a robust replication setup capable of seamlessly handling a failover.
Why Failover Slots Matter From a Historical Perspective
Challenges in PostgreSQL 15
- Replication Slots Tied to the Primary Node: In PostgreSQL 15, replication slots were only created on the primary server. All logical replication slots were lost if the primary server failed, leading to significant replication delays and data loss.
- Manual Failover Management: During failover scenarios, administrators manually recreated replication slots on the new primary server, which increased complexity, introduced errors, and prolonged downtime.
- No Slot Synchronization: Standby servers had no way of knowing about logical replication slots on the primary. This lack of synchronization led to a complete reset of replication streams if a failover occurred.
Improvements in PostgreSQL 16
Minimal Logical Decoding
PostgreSQL 16 introduced a feature called minimal logical decoding on standbys:
- Minimal Decoding on Standby: This allowed standby servers to decode WAL logs to prepare for logical replication, enabling pre-warmed slots for use if a failover occurred.
- Faster Failover: By pre-decoding WAL changes on the standby, it was possible to reduce replication lag when promoting a standby to the primary. However, this still required some manual configuration to ensure smooth failover.
PostgreSQL 17: The Game-Changer – Failover Slots
- Failover Slots: Introducing failover slots in PostgreSQL 17 eliminates the need for manual intervention by automatically synchronizing logical replication slots between the primary and standby servers.
- Automatic Synchronization: The new slot sync worker ensures that failover-enabled slots (failover = true) are always synchronized, even while the primary node is active.
- Seamless Transition: Upon failover, the standby server can take over as the primary without losing any replication slots, ensuring zero data loss and continuous replication.
Feature |
PostgreSQL 15 |
PostgreSQL 16 |
PostgreSQL 17 |
---|---|---|---|
Logical Replication |
Yes |
Yes |
Yes |
Automatic Slot Synchronization |
No |
Minimal logical decoding on Standby |
Full failover slots |
Failover Handling |
Manual intervention needed |
Pre-warmed slots on standby |
Automatic failover slots |
Slot Synchronization to Standby |
Not supported |
Minimal, requires configuration |
Automatic with slotsync worker |
High Availability for Logical Replication |
Limited |
Improved with minimal decoding |
Seamless with failover slots |
Creating a High-Availability Cluster With Failover Slots
This section will walk you through creating a PostgreSQL high-availability cluster with failover slots. In our example, we’ll use the following nodes:
- NodeA (Primary Server)
- NodeB (Physical Standby)
- NodeC (Logical Subscriber)
Prerequisites
Before we start, ensure you have:
- PostgreSQL 17 was installed on all three nodes.
- Passwordless SSH access between each node.
- A basic understanding of PostgreSQL, PostgreSQL replication, and PostgreSQL configuration files.
Step 1: Configuring the Primary Node (NodeA)
1.1 Initialize the cluster on NodeA
After installing PostgreSQL on the primary node, initialize the cluster; you can use the following commands:
mkdir -p /home/pgedge/nodeA
initdb -D /home/pgedge/nodeA --no-locale -E UTF8
pg_ctl -D /home/pgedge/nodeA -l /home/pgedge/logs/nodeA.log start
1.2 Configure replication in the postgresql.conf file
After initializing the cluster, edit the postgresql.conf
file, located by default in /home/pgedge/nodeA/postgresql.conf
. Set the following parameter values:
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
synchronous_standby_names = '*'
synchronized_standby_slots = 'sb1_slot'
port = 5432
1.3 Update the pg_hba.conf file allowing for Replication Access
The pg_hba.conf file manages client authentication for the PostgreSQL server. Add the following entry to /home/pgedge/nodeA/pg_hba.conf
to ensure access for a replication user:
host replication replicator 127.0.0.1/32 md5
Then, reload the configuration:
pg_ctl -D /home/pgedge/nodeA reload
1.4 Create a Replication User
Then, log into PostgreSQL and create the replication user:
psql -d postgres -p 5432
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'replicator_password';
1.5 Create a Table and Set Up a Publication
Next, you’ll need to create a table and create an associated publication:
CREATE TABLE foo (c1 INT PRIMARY KEY);
GRANT SELECT ON foo TO replicator;
CREATE PUBLICATION mypub FOR TABLE foo;
Step 2: Configuring the Physical Standby (NodeB)
2.1 Initialize NodeB
After installing PostgreSQL, initialize NodeB:
mkdir -p /home/pgedge/nodeB
initdb -D /home/pgedge/nodeB --no-locale -E UTF8
pg_ctl -D /home/pgedge/nodeB -l /home/pgedge/logs/nodeB.log start
2.1 Create a Base Backup
Then, use pg_basebackup to take a backup of the cluster:
mkdir -p /home/pgedge/nodeB
pg_basebackup -D /home/pgedge/nodeB -R -X stream -P -h localhost -p 5432 -U replicator
2.2 Configure postgresql.conf on Node-B
Modify the postgresql.conf
file (located in /home/pgedge/nodeB/postgresql.conf
), setting:
port = 5433
primary_conninfo = 'host=localhost port=5432 user=replicator password=replicator_password dbname=postgres application_name=sb1_slot'
primary_slot_name = 'sb1_slot'
hot_standby_feedback = on
sync_replication_slots = on
2.3 Enable Failover Slot Synchronization
Use the psql client to log in to NodeB:
psql -d postgres -p 5433
Then, use the following statements to configure replication for NodeB:
ALTER SYSTEM SET sync_replication_slots = on;
ALTER SYSTEM SET hot_standby_feedback = on;
ALTER SYSTEM SET synchronized_standby_slots = 'sb1_slot';
Exit the psql
client and restart NodeB:
pg_ctl -D /home/pgedge/nodeB restart
2.4 Verify Slot Synchronization
Then, reconnect to NodeB with psql
and verify that the slots are synchronized:
SELECT slot_name, failover, synced FROM pg_replication_slots;
Step 3: Setting Up the Logical Subscriber (NodeC)
3.1 Initialize the cluster and configure NodeC
After installing PostgreSQL, initialize the cluster; you can use the following commands:
mkdir -p /home/pgedge/nodeC
initdb -D /home/pgedge/nodeC --no-locale -E UTF8
Then, edit the /home/pgedge/nodeC/postgresql.conf
file, setting the following parameter values:
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
sync_replication_slots = on
port = 5444
After editing the configuration file, start NodeC:
pg_ctl -D /home/pgedge/nodeC -l /home/pgedge/logs/nodeC.log start
3.2 Create a Subscription on NodeC
Use the following command to create a subscription on NodeC:
CREATE SUBSCRIPTION foosub CONNECTION 'dbname=postgres host=localhost port=5432 user=replicator password=replicator_password' PUBLICATION mypub WITH (failover = true);
Step 4: Simulating Failover and Ensuring Continuity
You can use the following commands to simulate a failover and confirm that replication continues and data integrity is preserved.
4.1 Simulating a Failover
Use the following commands to simulate a failure of NodeA, followed by promotion from standby to primary of NodeB:
pg_ctl -D /home/pgedge/nodeA stop
pg_ctl -D /home/pgedge/nodeB promote
4.2 Update the Subscription on NodeC
After promoting nodeB, log in to NodeC and update the connection to reflect that NodeB is now the primary node:
ALTER SUBSCRIPTION foosub DISABLE;
ALTER SUBSCRIPTION foosub CONNECTION 'dbname=postgres host=localhost port=5433 user=replicator password=replicator_password';
ALTER SUBSCRIPTION foosub ENABLE;
4.3 Verify Data Continuity
To test replication, use psql
to log in to Node-B (now the primary):
INSERT INTO foo VALUES (3), (4);
Check replication on Node-C:
SELECT * FROM foo;
Conclusion
PostgreSQL 17’s failover slot feature allows for seamless failover in logical replication environments. Following the steps outlined in this guide, you can create a high-availability cluster that ensures uninterrupted data flow, even during a primary server failure.
By optimizing configurations and leveraging PostgreSQL 17’s new capabilities, you can create a resilient and efficient database infrastructure for your mission-critical applications.
Source:
https://dzone.com/articles/setting-up-failover-slots-in-postgresql-17