Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date: 2020-04-07 04:18:47
Message-ID: 20200407041847.GA519993@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> >> So I think what we're actually trying to accomplish here is to
> >> ensure that instead of deleting up to half of the SLRU space
> >> before the cutoff, we delete up to half-less-one-segment.
> >> Maybe it should be half-less-two-segments, just to provide some
> >> cushion against edge cases. Reading the first comment in
> >> SetTransactionIdLimit makes one not want to trust too much in
> >> arguments based on the exact value of xidWrapLimit, while for
> >> the other SLRUs it was already unclear whether the edge cases
> >> were exactly right.
>
> > That could be interesting insurance. While it would be sad for us to miss an
> > edge case and print "must be vacuumed within 2 transactions" when wrap has
> > already happened, reaching that message implies the DBA burned ~1M XIDs, all
> > in single-user mode. More plausible is FreezeMultiXactId() overrunning the
> > limit by tens of segments. Hence, if we do buy this insurance, let's skip far
> > more segments. For example, instead of unlinking segments representing up to
> > 2^31 past XIDs, we could divide that into an upper half that we unlink and a
> > lower half. The lower half will stay in place; eventually, XID consumption
> > will overwrite it. Truncation behavior won't change until the region of CLOG
> > for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
> > vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
> > maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?
>
> Hmm. I'm not particularly concerned about the disk-space-consumption
> angle, but I do wonder about whether we'd be sacrificing the ability to
> recover cleanly from a situation that the code does let you get into.
> However, as long as we're sure that the system will ultimately reuse/
> recycle a not-deleted old segment file without complaint, it's hard to
> find much fault with your proposal.

That is the trade-off. By distancing ourselves from the wraparound edge
cases, we'll get more segment recycling (heretofore an edge case).
Fortunately, recycling doesn't change behavior as you approach some limit; it
works or it doesn't.

> Temporarily wasting some disk
> space is a lot more palatable than corrupting data, and these code
> paths are necessarily not terribly well tested. So +1 for more
> insurance.

Okay, I'll give that a try. I expect this will replace the PagePrecedes
callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
PagePrecedes(a, b). PageDiff callbacks shall distribute return values
uniformly in [INT_MIN,INT_MAX]. SimpleLruTruncate() will unlink segments
where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-04-07 04:25:21 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Jeevan Chalke 2020-04-07 04:15:00 Re: WIP/PoC for parallel backup