Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date: 2020-03-25 20:42:31
Message-ID: 13116.1585168951@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Thu, Mar 19, 2020 at 06:04:52PM -0400, Tom Lane wrote:
>> 1. It seems like this discussion is conflating two different issues.

> When the newest XID and the oldest XID fall in "opposite" segments in the XID
> space, we must not unlink the segment containing the newest XID. That is the
> chief goal at present.

Got it. Thanks for clarifying the scope of the patch.

>> 3. It feels like the proposed test of cutoff position against both
>> ends of a segment is a roundabout way of fixing the problem. I
>> wonder whether we ought not pass *both* the cutoff and the current
>> endpoint (latest_page_number) down to the truncation logic, and
>> have it compare against both of those values.

> Since latest_page_number can keep changing throughout SlruScanDirectory()
> execution, that would give a false impression of control. Better to
> demonstrate that the xidWrapLimit machinery keeps latest_page_number within
> acceptable constraints than to ascribe significance to a comparison with a
> stale latest_page_number.

Perhaps. I'm prepared to accept that line of argument so far as the clog
SLRU goes, but I'm not convinced that the other SLRUs have equally robust
defenses against advancing too far. So on the whole I'd rather that the
SLRU logic handled this issue strictly on the basis of what it knows,
without assumptions about what calling code may be doing. Still, maybe
we only really care about the risk for the clog SLRU?

>> To try to clarify this in my head, I thought about an image of
>> the modular XID space as an octagon, where each side would correspond to
>> a segment file if we chose numbers such that there were only 8 possible
>> segment files.

> Exactly.
> https://docs.google.com/drawings/d/1xRTbQ4DVyP5wI1Ujm_gmmY-cC8KKCjahEtsU_o0fC7I
> uses your octagon to show the behaviors before and after this patch.

Cool, thanks for drafting that up. (My original sketch was not of
publishable quality ;-).) To clarify, the upper annotations probably
ought to read "nextXid <= xidWrapLimit"? And "cutoffPage" ought
to be affixed to the orange dot at lower right of the center image?

I agree that this diagram depicts why we have a problem right now,
and the right-hand image shows what we want to have happen.
What's a little less clear is whether the proposed patch achieves
that effect.

In particular, after studying this awhile, it seems like removal
of the initial "cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT"
adjustment isn't really affecting anything. It's already the case
that just by allowing oldestXact to get rounded back to an SLRU page
boundary, we've created some daylight between oldestXact and the
cutoff point. Rounding back further within the same SLRU segment
changes nothing. (For example, suppose that oldestXact is already
within the oldest page of its SLRU segment. Then either rounding
rule has the same effect. But there's still a little bit of room for
xidWrapLimit to be in the opposite SLRU segment, allowing trouble.)

So I think what we're actually trying to accomplish here is to
ensure that instead of deleting up to half of the SLRU space
before the cutoff, we delete up to half-less-one-segment.
Maybe it should be half-less-two-segments, just to provide some
cushion against edge cases. Reading the first comment in
SetTransactionIdLimit makes one not want to trust too much in
arguments based on the exact value of xidWrapLimit, while for
the other SLRUs it was already unclear whether the edge cases
were exactly right.

In any case, it feels like the specific solution you have here,
of testing both ends of the segment, is a roundabout way of
providing that one-segment slop; and it doesn't help if we decide
we need two-segment slop. Can we write the test in a way that
explicitly provides N segments of slop?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-03-25 20:45:40 Re: Allow continuations in "pg_hba.conf" files
Previous Message Robert Haas 2020-03-25 20:32:15 Re: plan cache overhead on plpgsql expression