Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups
Date: 2017-01-23 04:24:38
Message-ID: CAMsr+YE7xH2_fAZp6-o7L+T437wWagMZhN0ds0GsE22pi222Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 January 2017 at 05:32, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 1/19/17 10:06 AM, Alvaro Herrera wrote:
>>> Also, I wonder whether we should not in vacuum.c change the order of the
>>> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just
>>> to keep everything consistent.
>>
>> I am wary of doing that. The current coding is well battle-tested by
>> now, but doing things in the opposite order, not at all. Pending some
>> testing to show that there is no problem with a change, I would leave
>> things as they are.
>
> I appreciate this hesitation.
>
> The issue the patch under discussion is addressing is that we have a
> user-space interface to read information about commit timestamps. So if
> we truncate away old information before we update the mark where the
> good information starts, then we get the race. We don't have a
> user-space interface reading, say, the clog, but it doesn't seem
> implausible for that to exist at some point. How would this code have
> to be structured then?

I have a patch in the current commitfest that exposes just such a user
interface, txid_status() .

See https://commitfest.postgresql.org/12/730/ .

Robert was about to commit when he identified this race in commit
status lookup, which led me to identify the same race addressed here
for commit timestamps.

>> What I fear is: what
>> happens if a concurrent checkpoint reads the values between the two
>> operations, and a crash occurs? I think that the checkpoint might save
>> the updated values, so after crash recovery the truncate would not be
>> executed, possibly leaving files around. Leaving files around might be
>> dangerous for multixacts at least (it's probably harmless for xids).
>
> Why is leaving files around dangerous?

As far as I can tell, leaving the files around as such isn't dangerous.

The problem arises if we wrap around and start treating old SLRU
contents as new again without first truncating them away.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-01-23 04:34:27 Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups
Previous Message Tsunakawa, Takayuki 2017-01-23 04:23:49 Re: Commit fest 2017-01 will begin soon!