Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups
Date: 2017-01-20 13:40:28
Message-ID: 20170120134028.bhgszattkbrwbdqm@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:
> On 1/19/17 10:06 AM, Alvaro Herrera wrote:
> >> Also, I wonder whether we should not in vacuum.c change the order of the
> >> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just
> >> to keep everything consistent.
> >
> > I am wary of doing that. The current coding is well battle-tested by
> > now, but doing things in the opposite order, not at all. Pending some
> > testing to show that there is no problem with a change, I would leave
> > things as they are.
>
> I appreciate this hesitation.
>
> The issue the patch under discussion is addressing is that we have a
> user-space interface to read information about commit timestamps. So if
> we truncate away old information before we update the mark where the
> good information starts, then we get the race.

Correct.

> We don't have a user-space interface reading, say, the clog, but it
> doesn't seem implausible for that to exist at some point. How would
> this code have to be structured then?

One option would be to add another limit Xid which advances before the
truncation but which is not used for other decisions other than limiting
what can users consult. Another option is not to implement direct reads
from the clog. Yet another option is that before we add such interface
somebody produces proof that the problem does not, in fact, exist. I
think producing proof is onerous enough that nobody will step up to it
until there is some real need for user-invoked clog lookups. (We've
gone so long without them, perhaps we will never need them.)

> > What I fear is: what
> > happens if a concurrent checkpoint reads the values between the two
> > operations, and a crash occurs? I think that the checkpoint might save
> > the updated values, so after crash recovery the truncate would not be
> > executed, possibly leaving files around. Leaving files around might be
> > dangerous for multixacts at least (it's probably harmless for xids).
>
> Why is leaving files around dangerous?

Leftover files could confuse the truncation algorithm. We already had
this problem for multixacts, and it took a lot of sweat and blood to get
it fixed. The algorithm has since been fixed, but the mechanism is
delicate enough that I am afraid that tinkering with it may break it.

> If this is a problem, why is it not a problem for commit timestamps?
> I don't understand why these different SLRU uses are different.

commit_ts heavily depends on the clog cycle, so my thinking is that if
clog is protected enough, then so is commit_ts. Given a strong clog, I
expect commit_ts not to break. If I make clog brittle, will that break
commit_ts too? Perhaps. Maybe not, but again that would require more
study.

I have looked at clog a bit more this morning and I think perhaps it is
safe to make it work in the same way as commit_ts -- but somebody would
have to go to the trouble of verifying that it is.

> Yeah, we could go ahead with this patch as is and it might be fine, but
> I feel like we don't understand the broader issue completely yet.

ISTM the more complicated uses of slru.c have turned out to have a lot
of emergent properties that weren't completely understood, so yeah I
agree that we don't (or at least I don't, and most people don't either).
Someday, somebody will rewrite slru.c and/or remove multixacts, and
these problems will all go away.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2017-01-20 13:44:12 Re: Improvements in psql hooks for variables
Previous Message Michael Paquier 2017-01-20 13:14:14 Re: Patch to implement pg_current_logfile() function