Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?
Date: 2021-05-05 15:15:53
Message-ID: CAMsr+YHPZetkB0OeBsBu76+Km=Mb4SEbP1=-jJ6fF9MGSYn8mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 9 Feb 2021 at 05:52, Andres Freund <andres(at)anarazel(dot)de> wrote:

>
> Craig, it kind of looks to me like you assumed it'd be guaranteed that
> the xid at this point would show in-progress?
>

At the time I wrote that code, I don't think I understood that xid
assignment wasn't necessarily durable until either (a) the next checkpoint;
or (b) commit of some txn with a greater xid.

IIRC I expected that after crash and recovery the tx would always be
treated as aborted, because the xid had been assigned but no corresponding
commit was found before end-of-recovery. No explicit abort records are
written to WAL for such txns since we crashed, but the server's oldest
in-progress txn threshold is used to determine that they must be aborted
rather than in-progress even though their clog entries aren't set to
aborted.

Which was fine as far as it went, but I failed to account for the xid
assignment not necessarily being durable when the client calls
txid_status().

> I don't think the use of txid_status() described in the docs added in
> the commit is actually ever safe?
>

I agree. The client can query for its xid with txid_current() but as you
note there's no guarantee that the assigned xid is durable.

The client would have to ensure that an xid was assigned, then ensure that
the WAL was durably flushed past the point of the xid assignment before
relying on the xid.

If we do a txn that performs a small write, calls txid_current(), and sends
a commit that the server crashes before completing, we can't know for sure
that the xid we recorded client-side before the server crash is the same
txn we check the status of after crash recovery. Some other txn could've
re-used the xid after crash so long as no other txn with a greater xid
durably committed before the crash.

That scenario isn't hugely likely, but it's definitely possible on systems
that don't do a lot of concurrent txns or do mostly long, heavyweight txns.

The txid_status() function was originally intended to be paired with a way
to report topxid assignment to the client automatically, NOTIFY or
GUC_REPORT-style. But that would not make this usage safe either, unless we
delayed the report until WAL was flushed past the LSN of the xid assignment
*or* some other txn with a greater xid committed.

This could be made safe with a variant of txid_current() that forced the
xid assignment to be logged immediately if it was not already, and did not
return until WAL flushed past the point of the assignment. If the client
did most of the txn's work before requesting a guaranteed-durable xid, it
would in practice not land up having to wait for a flush. But we'd have to
keep track of when we assigned the xid in every single topxact in order to
be able to promise we'd flushed it without having to immediately force a
flush. That's pointless overhead all the rest of the time, just in case
someone wants to get an xid for later use with txid_status().

The simplest option with no overhead on anything that doesn't care about
txid_status() is to expose a function to force flush of WAL up to the
current insert LSN. Then update the docs to say you have to call it after
txid_current(), and before sending your commit. But at that point you might
as well use 2PC, since you're paying the same double flush and double
round-trip costs. The main point of txid_status() was to avoid the cost of
that double-flush.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2021-05-05 15:17:13 Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?
Previous Message Michał Wadas 2021-05-05 15:06:44 Proposal: per expression intervalstyle