Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication
Date: 2025-10-30 13:38:50
Message-ID: CANzqJaAxXeczeZ+PyHO8RXZbHA-ipajFL8RKzXOrkxJUSG_fqA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I'd log into Server G and tail -f the Postgresql log file, then run "pg_ctl
status". Maybe then "pg_ctl start -wt9999", depending on what the log file
says,

On Thu, Oct 30, 2025 at 5:30 AM Edwin UY <edwin(dot)uy(at)gmail(dot)com> wrote:

> Hi,
>
> Apologies for a long email. I suppose as much information as possible will
> help with troubleshooting
> PostgreSQL is Version 11. I know, it's old, I don't have a choice due to
> the application.
>
> There is a PRIMARY and 2 replicas, SYNC and ASYNC.
> We had a network outage that rendered the application unusable for some
> reason even though we still have a PRIMARY and a replication server in
> place.
> This is now resolved since the network is restored so I am just wanting to
> get some guidance for a quick resolution in the future.
>
> Not really sure how to confirm which one is SYNC or ASYNC.
> select * from pg_stat_replication from the PRIMARY shows nothing
> So, I am left with no choice but to trust the documentation where it says
>
> SERVER -E = PRIMARY
> SERVER -F = ASYNC
> SERVER -G = SYNC
>
> When we have the network issue.
> SERVER-E and SERVER-F are accessible and they can communicate to each
> other. SERVER-G is not accessible. However the application connection is
> intermittently dropping.
>
> The primary is showing several errors like below:
> STATEMENT: ROLLBACK PREPARED 'gid'
> ERROR: prepared transaction with identifier "gid" is busy
>
> SERVER-F is showing
> FATAL: could not connect to the primary server: could not connect to
> server: No route to host
> Is the server running on host "SERVER-G" and accepting
> TCP/IP connections on port 5432?
>
> Can't check SERVER-G as it is not accessible.
>
> I assume the prepared transactions are from the replication, not from the
> application.
> The error from SERVER-F is as expected since SERVER-G is not accessible.
> Under this scenario, the application is intermittently having issues
> connecting to the database. Not sure why.
> We have re-started both databases SERVER-E and SERVER-F and clear up the
> prepared transaction as well using
> https://www.cybertec-postgresql.com/en/prepared-transactions.
> After startup we can see the prepared transaction gone, pg_prepared_xacts
> is emptty and then will show one one prepare transaction that is active
> based on pg_stat_activity.
> select * from pg_stat_replication still shows nothing.
> To resolve the SERVER-F error, we change the recovery.conf and
> change primary_conninfo to use SERVER-E.
> This still did not resolve the application issue and the primary log still
> shows the following every so often.
>
> STATEMENT: ROLLBACK PREPARED 'gid'
> ERROR: prepared transaction with identifier "gid" is busy
>
> At this stage, I thought maybe the PRIMARY and the replicas are configured
> in such a way that the PRIMARY must receive confirmation from both that it
> has committed too otherwise it will just continue waiting.
> Under this scenario, it is not able too since SERVER-G is not accessible.
> Does that make sense?
>
> Anyway, maybe someone will be interested to read this email and can shed
> some light on this and can advise whether there's some configuration
> setting somewhere that we should have modified as a temporary workaround.
> Could it be because of synchronous_commit= on? Maybe we should have
> changed this when SERVER-G is not accessible?
>
> Everything is back to normal once SERVER-G has become accessible again.
> That is about 6 hours though :( and doesn't explain why things will stop
> working normally when a replica is down and the PRIMARY is still accessible.
> Does that mean, if both replicas are down and only the PRIMARY is
> accessible, we have to totally turn off / disable replication?
> If we do need to break the replica, when the PRIMARY is UP and both
> replicas are inaccessible, do we just unset synchronous_standby_names?
>
> Any reply is much appreciated. Thanks in advance.
>
> Regards,
> Ed
>
>

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Ribe 2025-10-30 13:52:02 Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication
Previous Message Laurenz Albe 2025-10-30 12:15:03 Re: WAL replay is too slow on secondary server