Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups
Date: 2016-12-28 14:16:34
Message-ID: 979ff13d-0b8e-4937-01e8-2925c0adc306@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28/12/16 15:01, Craig Ringer wrote:
> Hi all
>
> There's a minor race between commit_ts SLRU truncation and concurrent
> commit_ts lookups, where a lookup can check the lower valid bound xid
> without knowing it's already been truncated away. This would result in
> a SLRU lookup error.
>
> It's pretty low-harm since it's hard to trigger and the only problem
> is an error being emitted when we should otherwise return null/zero.
> Most notably you have to pass an xid that used to be within the
> datrozenxid but due to a concurrent vacuum has just moved outside it.
> This can't happen if you're passing the xmin of a tuple that still
> exists so it only matters for callers passing arbitrary XIDs in via
> pg_xact_commit_timestamp(...).
>
> The race window is bigger on standby because there we don't find out
> about the advance of the lower commit ts bound until the next
> checkpoint. But you still have to be looking at very old xids that
> don't exist on the heap anymore.
>
> We might as well fix it in HEAD, but it's totally pointless to
> back-patch, and the only part of the race that can be realistically
> hit is on standby, where we can't backpatch a fix w/o changing the
> xlog format. Nope. We could narrow the scope by limiting commit_ts
> slru truncation to just before a checkpoint, but given how hard this
> is to hit... I don't care.
>
> (This came up as part of the investigation I've been doing on the
> txid_status thread, where Robert pointed out a similar problem that
> can arise where txid_status races with clog truncation. I noticed the
> issue with standby while looking into that.)
>

Hi,

I remember thinking this might affect committs when I was reading that
thread but didn't have time to investigate it yet myself. Thanks for
doing the all the work yourself.

About the patch, it looks good to me for master with the minor exception
that:
> + appendStringInfo(buf, "pageno %d, xid %u",
> + trunc.pageno, trunc.oldestXid);

This should probably say oldestXid instead of xid in the text description.

About back-patching, I wonder if standby could be modified to move
oldestXid based on pageno reverse calculation, but it's going to be
slightly ugly.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-12-28 14:17:16 make more use of RoleSpec struct
Previous Message Pavel Stehule 2016-12-28 14:04:13 Re: proposal: session server side variables