Re: pgsql: snapshot scalability: cache snapshots using a xact completion co

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: snapshot scalability: cache snapshots using a xact completion co
Date: 2020-08-18 20:28:05
Message-ID: 20200818202805.5f7rbfh2zyfkzqqq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On 2020-08-18 01:21:17 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I'd written to Tom that I was planning to revert unless the number of
> > failures were lower than initially indicated. But that actually seems to
> > have come to pass (the failures are quicker to report because they don't
> > run the subsequent tests, of course). I'd like to let the failures
> > accumulate a bit longer, say until tomorrow Midday if I haven't figured
> > it out by then. With the hope of finding some detail to help pinpoint
> > the issue.
>
> There's certainly no obvious pattern here, so I agree with waiting for
> more data.

FWIW, I think I have found the bug, but I'm still working to reproduce
the issue reliably enough that I can verify that the fix actually works.

The issue is basically that 2PC PREPARE is weird, WRT procarray. The
last snapshot built with GetSnapshotData() before the PREPARE doesn't
include its own transaction in ->xip[], as normal. PrepareTransaction()
removes the "normal" entry with ProcArrayClearTransaction(), which so
far doesn't increase the xact completion count. Because the xact
completion count is not increased, snapshots can be reused as long as
they're taken before the 2PC transaction is finished. That's fine for
other backends, but for the backend doing the PrepareTransaction() it's
not, because there ->xip doesn't include the own backend.

It's a bit tricky to reproduce exactly the issue the BF is occasionally
hitting, because the way ->xmax is computed *limits* the
damage. Combined with the use of SERIALIZABLE (preventing recomputation
of the data snapshot) that makes it somewhat hard to hit.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2020-08-18 23:37:43 pgsql: Fix race condition in snapshot caching when 2PC is used.
Previous Message Heikki Linnakangas 2020-08-18 10:26:31 pgsql: Avoid non-constant format string argument to fprintf().

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-08-18 21:51:19 Re: Print logical WAL message content
Previous Message Tom Lane 2020-08-18 20:09:44 Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails