Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication

From: Edwin UY <edwin(dot)uy(at)gmail(dot)com>
To: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication
Date: 2025-10-30 09:30:00
Message-ID: CA+wokJ_sf=EZrdHgPnPn_6vUG43B0yTHS50MaM5jLeS1z6LYyQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

Apologies for a long email. I suppose as much information as possible will
help with troubleshooting
PostgreSQL is Version 11. I know, it's old, I don't have a choice due to
the application.

There is a PRIMARY and 2 replicas, SYNC and ASYNC.
We had a network outage that rendered the application unusable for some
reason even though we still have a PRIMARY and a replication server in
place.
This is now resolved since the network is restored so I am just wanting to
get some guidance for a quick resolution in the future.

Not really sure how to confirm which one is SYNC or ASYNC.
select * from pg_stat_replication from the PRIMARY shows nothing
So, I am left with no choice but to trust the documentation where it says

SERVER -E = PRIMARY
SERVER -F = ASYNC
SERVER -G = SYNC

When we have the network issue.
SERVER-E and SERVER-F are accessible and they can communicate to each
other. SERVER-G is not accessible. However the application connection is
intermittently dropping.

The primary is showing several errors like below:
STATEMENT: ROLLBACK PREPARED 'gid'
ERROR: prepared transaction with identifier "gid" is busy

SERVER-F is showing
FATAL: could not connect to the primary server: could not connect to
server: No route to host
Is the server running on host "SERVER-G" and accepting
TCP/IP connections on port 5432?

Can't check SERVER-G as it is not accessible.

I assume the prepared transactions are from the replication, not from the
application.
The error from SERVER-F is as expected since SERVER-G is not accessible.
Under this scenario, the application is intermittently having issues
connecting to the database. Not sure why.
We have re-started both databases SERVER-E and SERVER-F and clear up the
prepared transaction as well using
https://www.cybertec-postgresql.com/en/prepared-transactions.
After startup we can see the prepared transaction gone, pg_prepared_xacts
is emptty and then will show one one prepare transaction that is active
based on pg_stat_activity.
select * from pg_stat_replication still shows nothing.
To resolve the SERVER-F error, we change the recovery.conf and
change primary_conninfo to use SERVER-E.
This still did not resolve the application issue and the primary log still
shows the following every so often.

STATEMENT: ROLLBACK PREPARED 'gid'
ERROR: prepared transaction with identifier "gid" is busy

At this stage, I thought maybe the PRIMARY and the replicas are configured
in such a way that the PRIMARY must receive confirmation from both that it
has committed too otherwise it will just continue waiting.
Under this scenario, it is not able too since SERVER-G is not accessible.
Does that make sense?

Anyway, maybe someone will be interested to read this email and can shed
some light on this and can advise whether there's some configuration
setting somewhere that we should have modified as a temporary workaround.
Could it be because of synchronous_commit= on? Maybe we should have changed
this when SERVER-G is not accessible?

Everything is back to normal once SERVER-G has become accessible again.
That is about 6 hours though :( and doesn't explain why things will stop
working normally when a replica is down and the PRIMARY is still accessible.
Does that mean, if both replicas are down and only the PRIMARY is
accessible, we have to totally turn off / disable replication?
If we do need to break the replica, when the PRIMARY is UP and both
replicas are inaccessible, do we just unset synchronous_standby_names?

Any reply is much appreciated. Thanks in advance.

Regards,
Ed

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2025-10-30 10:03:40 Re: WAL replay is too slow on secondary server
Previous Message OMPRAKASH SAHU 2025-10-30 04:36:56 WAL replay is too slow on secondary server