Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date: 2020-10-29 04:01:59
Message-ID: 20201029040159.GA3803708@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote:
> On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote:
> > On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
> > > On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
> > > > Noah Misch <noah(at)leadboat(dot)com> writes:
> > > > > On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> > > > >> So I think what we're actually trying to accomplish here is to
> > > > >> ensure that instead of deleting up to half of the SLRU space
> > > > >> before the cutoff, we delete up to half-less-one-segment.
> > > > >> Maybe it should be half-less-two-segments, just to provide some
> > > > >> cushion against edge cases. Reading the first comment in
> > > > >> SetTransactionIdLimit makes one not want to trust too much in
> > > > >> arguments based on the exact value of xidWrapLimit, while for
> > > > >> the other SLRUs it was already unclear whether the edge cases
> > > > >> were exactly right.
> > > >
> > > > > That could be interesting insurance. While it would be sad for us to miss an
> > > > > edge case and print "must be vacuumed within 2 transactions" when wrap has
> > > > > already happened, reaching that message implies the DBA burned ~1M XIDs, all
> > > > > in single-user mode. More plausible is FreezeMultiXactId() overrunning the
> > > > > limit by tens of segments. Hence, if we do buy this insurance, let's skip far
> > > > > more segments. For example, instead of unlinking segments representing up to
> > > > > 2^31 past XIDs, we could divide that into an upper half that we unlink and a
> > > > > lower half. The lower half will stay in place; eventually, XID consumption
> > > > > will overwrite it. Truncation behavior won't change until the region of CLOG
> > > > > for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
> > > > > vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
> > > > > maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?
> >
> > > > Temporarily wasting some disk
> > > > space is a lot more palatable than corrupting data, and these code
> > > > paths are necessarily not terribly well tested. So +1 for more
> > > > insurance.
> > >
> > > Okay, I'll give that a try. I expect this will replace the PagePrecedes
> > > callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
> > > PagePrecedes(a, b). PageDiff callbacks shall distribute return values
> > > uniformly in [INT_MIN,INT_MAX]. SimpleLruTruncate() will unlink segments
> > > where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.
> >
> > While doing so, I found that slru-truncate-modulo-v2.patch did get edge cases
> > wrong, as you feared. In particular, if the newest XID reached xidStopLimit
> > and was in the first page of a segment, TruncateCLOG() would delete its
> > segment. Attached slru-truncate-modulo-v3.patch fixes that; as restitution, I
> > added unit tests covering that and other scenarios. Reaching the bug via XIDs
> > was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
> > single-user mode. I expect the bug was easier to reach via pg_multixact.
> >
> > The insurance patch stacks on top of the bug fix patch. It does have a
> > negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
> > to skip truncation in corrupted clusters. SlruScanDirCbFindEarliest() gives
> > nonsense answers if "future" segments exist. That can happen today, but the
> > patch creates new ways to make it happen. The symptom is wasting yet more
> > space in pg_multixact. I am okay with this, since it arises only after one
> > fills pg_multixact 50% full. There are alternatives. We could weaken the
> > corruption defense in TruncateMultiXact() or look for another implementation
> > of equivalent defense. We could unlink, say, 75% or 95% of the "past" instead
> > of 50% (this patch) or >99.99% (today's behavior).
>
> Rebased the second patch. The first patch did not need a rebase.

Rebased both patches, necessitated by commit dee663f changing many of the same
spots. I've updated one of the log messages for 592a589 having landed.

I've also changed a patch name stem from slru-truncate-insurance to
slru-truncate-t-insurance, so it sorts after the other patch. Perhaps that
will trick http://cfbot.cputube.org/noah-misch.html into applying the patches
in the right order. If not, continue to ignore cfbot.

Attachment Content-Type Size
slru-truncate-modulo-v4.patch text/plain 27.8 KB
slru-truncate-t-insurance-v3.patch text/plain 39.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-10-29 04:26:45 Re: psql \df choose functions by their arguments
Previous Message Andres Freund 2020-10-29 04:00:30 Re: recovering from "found xmin ... from before relfrozenxid ..."