Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Ben Chobot <bench(at)silentmedia(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Date: 2022-02-11 02:18:12
Message-ID: 20220211021812.wgv324etfaz4v4yu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2022-02-11 10:38:34 +0900, Michael Paquier wrote:
> My impression is that we don't really need to change the WAL format
> based on the existing APIs we already have, or that in the worst case
> it would be possible to make things backward-compatible enough that it
> would not be a worry as long as the standbys are updated before the
> primaries.

I think it's a question of how "concurrently" we want concurrently to be on
hot standby nodes.

If we are OK with "not at all", we can just lock an AEL at the end of
WaitForLockersMultiple(). It's a bit harder to cause the necessary snapshot
conflict, but we could e.g. extend xl_running_xacts and use the record length
to keep both older primary -> newer replica, and the reverse working.

If we want to actually make concurrently work concurrently on the replica,
we'd have to work harder. That doesn't strike me as feasible for backpatching.

The most important bit bit would be to make WaitForOlderSnapshots() wait for
the xmin of hot_standby_feedback walsenders and/or physical slot xmin
horizons. For phase 5 & 6 we would probably have to do something like waiting
for snapshot conflicts vs walsenders/slots in addition to the
WaitForLockersMultiple() for conflicts on the primary.

Greetings,

Andres Freund

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Ryan Kelly 2022-02-11 03:21:56 ERROR: XX000: variable not found in subplan target list
Previous Message Tom Lane 2022-02-11 02:02:51 Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0