| From: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com> |
|---|---|
| To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
| Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication |
| Date: | 2026-03-18 07:28:44 |
| Message-ID: | CAHg+QDdd7BXB9HD9ddevk_D5TtweEBantcvJ5up5hznryZ33_w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Reviving this thread.
On Sun, Jan 29, 2023 at 9:55 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> For proc die, it looks like the suggestion was to process it
> immediately and upon next restart, don't allow user connections unless
> all sync standbys were caught up. However, we need to be able to allow
> replication connections from standbys so that they'll be able to
> stream the needed WAL and catch up with primary, allow superuser or
> users with pg_monitor role to connect to perform ALTER SYSTEM to
> remove the unresponsive sync standbys if any from the list or disable
> sync replication altogether or monitor for flush lsn/catch up status.
> And block all other connections. Note that replication, superuser and
> users with pg_monitor role connections are allowed only after the
> server reaches a consistent state not before that to not read any
> inconsistent data.
>
Allowing replication, superuser and pg_monitor seems reasonable to me.
>
> The trickiest part of doing the above is how we detect upon restart
> that the server received proc die while waiting for sync replication
> ACK. One idea might be to set a flag in the control file before the
> crash. Second idea might be to write a marker file (although I don't
> favor this idea); presence indicates that the server was waiting for
> sync replication ACK before the crash. However, we may not detect all
> sorts of crashes in a backend when it is waiting for sync replication
> ACK to do any of these two ideas. Therefore, this may not be a
> complete solution.
>
You cannot control the crash, it can be a simple power failure too and none
of them could have reached the disk.
Additionally, this is in a critical transaction commit path.
>
> Third idea might be to just let the primary wait for sync standbys to
> catch up upon restart irrespective of whether it was crashed or not
> while waiting for sync replication ACK. While this idea works well
> without having to detect all sorts of crashes, the primary may not
> come up if any unresponsive standbys are present (currently, the
> primary continues to be operational for read-only queries at least
> irrespective of whether sync standbys have caught up or not).
>
I prefer this approach because depending on the quorum policy defined in
the synchrnous_standby_names, the primary will open connections for
read/writes.
If there is no progress from sync standbys then Postgres admin has to jump
in regardless.
Thanks,
Satya
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2026-03-18 07:29:09 | Re: SQL Property Graph Queries (SQL/PGQ) |
| Previous Message | Henson Choi | 2026-03-18 07:21:36 | Re: SQL Property Graph Queries (SQL/PGQ) |