| From: | Ron Johnson <ronljohnsonjr(at)gmail(dot)com> |
|---|---|
| To: | Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication |
| Date: | 2025-10-30 13:38:50 |
| Message-ID: | CANzqJaAxXeczeZ+PyHO8RXZbHA-ipajFL8RKzXOrkxJUSG_fqA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-admin |
I'd log into Server G and tail -f the Postgresql log file, then run "pg_ctl
status". Maybe then "pg_ctl start -wt9999", depending on what the log file
says,
On Thu, Oct 30, 2025 at 5:30 AM Edwin UY <edwin(dot)uy(at)gmail(dot)com> wrote:
> Hi,
>
> Apologies for a long email. I suppose as much information as possible will
> help with troubleshooting
> PostgreSQL is Version 11. I know, it's old, I don't have a choice due to
> the application.
>
> There is a PRIMARY and 2 replicas, SYNC and ASYNC.
> We had a network outage that rendered the application unusable for some
> reason even though we still have a PRIMARY and a replication server in
> place.
> This is now resolved since the network is restored so I am just wanting to
> get some guidance for a quick resolution in the future.
>
> Not really sure how to confirm which one is SYNC or ASYNC.
> select * from pg_stat_replication from the PRIMARY shows nothing
> So, I am left with no choice but to trust the documentation where it says
>
> SERVER -E = PRIMARY
> SERVER -F = ASYNC
> SERVER -G = SYNC
>
> When we have the network issue.
> SERVER-E and SERVER-F are accessible and they can communicate to each
> other. SERVER-G is not accessible. However the application connection is
> intermittently dropping.
>
> The primary is showing several errors like below:
> STATEMENT: ROLLBACK PREPARED 'gid'
> ERROR: prepared transaction with identifier "gid" is busy
>
> SERVER-F is showing
> FATAL: could not connect to the primary server: could not connect to
> server: No route to host
> Is the server running on host "SERVER-G" and accepting
> TCP/IP connections on port 5432?
>
> Can't check SERVER-G as it is not accessible.
>
> I assume the prepared transactions are from the replication, not from the
> application.
> The error from SERVER-F is as expected since SERVER-G is not accessible.
> Under this scenario, the application is intermittently having issues
> connecting to the database. Not sure why.
> We have re-started both databases SERVER-E and SERVER-F and clear up the
> prepared transaction as well using
> https://www.cybertec-postgresql.com/en/prepared-transactions.
> After startup we can see the prepared transaction gone, pg_prepared_xacts
> is emptty and then will show one one prepare transaction that is active
> based on pg_stat_activity.
> select * from pg_stat_replication still shows nothing.
> To resolve the SERVER-F error, we change the recovery.conf and
> change primary_conninfo to use SERVER-E.
> This still did not resolve the application issue and the primary log still
> shows the following every so often.
>
> STATEMENT: ROLLBACK PREPARED 'gid'
> ERROR: prepared transaction with identifier "gid" is busy
>
> At this stage, I thought maybe the PRIMARY and the replicas are configured
> in such a way that the PRIMARY must receive confirmation from both that it
> has committed too otherwise it will just continue waiting.
> Under this scenario, it is not able too since SERVER-G is not accessible.
> Does that make sense?
>
> Anyway, maybe someone will be interested to read this email and can shed
> some light on this and can advise whether there's some configuration
> setting somewhere that we should have modified as a temporary workaround.
> Could it be because of synchronous_commit= on? Maybe we should have
> changed this when SERVER-G is not accessible?
>
> Everything is back to normal once SERVER-G has become accessible again.
> That is about 6 hours though :( and doesn't explain why things will stop
> working normally when a replica is down and the PRIMARY is still accessible.
> Does that mean, if both replicas are down and only the PRIMARY is
> accessible, we have to totally turn off / disable replication?
> If we do need to break the replica, when the PRIMARY is UP and both
> replicas are inaccessible, do we just unset synchronous_standby_names?
>
> Any reply is much appreciated. Thanks in advance.
>
> Regards,
> Ed
>
>
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Scott Ribe | 2025-10-30 13:52:02 | Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication |
| Previous Message | Laurenz Albe | 2025-10-30 12:15:03 | Re: WAL replay is too slow on secondary server |