Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date: 2020-08-30 05:34:33
Message-ID: 20200830053433.GA3275593@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote:
> On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
> > On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
> > > Noah Misch <noah(at)leadboat(dot)com> writes:
> > > > On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> > > >> So I think what we're actually trying to accomplish here is to
> > > >> ensure that instead of deleting up to half of the SLRU space
> > > >> before the cutoff, we delete up to half-less-one-segment.
> > > >> Maybe it should be half-less-two-segments, just to provide some
> > > >> cushion against edge cases. Reading the first comment in
> > > >> SetTransactionIdLimit makes one not want to trust too much in
> > > >> arguments based on the exact value of xidWrapLimit, while for
> > > >> the other SLRUs it was already unclear whether the edge cases
> > > >> were exactly right.
> > >
> > > > That could be interesting insurance. While it would be sad for us to miss an
> > > > edge case and print "must be vacuumed within 2 transactions" when wrap has
> > > > already happened, reaching that message implies the DBA burned ~1M XIDs, all
> > > > in single-user mode. More plausible is FreezeMultiXactId() overrunning the
> > > > limit by tens of segments. Hence, if we do buy this insurance, let's skip far
> > > > more segments. For example, instead of unlinking segments representing up to
> > > > 2^31 past XIDs, we could divide that into an upper half that we unlink and a
> > > > lower half. The lower half will stay in place; eventually, XID consumption
> > > > will overwrite it. Truncation behavior won't change until the region of CLOG
> > > > for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
> > > > vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
> > > > maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?
>
> > > Temporarily wasting some disk
> > > space is a lot more palatable than corrupting data, and these code
> > > paths are necessarily not terribly well tested. So +1 for more
> > > insurance.
> >
> > Okay, I'll give that a try. I expect this will replace the PagePrecedes
> > callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
> > PagePrecedes(a, b). PageDiff callbacks shall distribute return values
> > uniformly in [INT_MIN,INT_MAX]. SimpleLruTruncate() will unlink segments
> > where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.
>
> While doing so, I found that slru-truncate-modulo-v2.patch did get edge cases
> wrong, as you feared. In particular, if the newest XID reached xidStopLimit
> and was in the first page of a segment, TruncateCLOG() would delete its
> segment. Attached slru-truncate-modulo-v3.patch fixes that; as restitution, I
> added unit tests covering that and other scenarios. Reaching the bug via XIDs
> was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
> single-user mode. I expect the bug was easier to reach via pg_multixact.
>
> The insurance patch stacks on top of the bug fix patch. It does have a
> negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
> to skip truncation in corrupted clusters. SlruScanDirCbFindEarliest() gives
> nonsense answers if "future" segments exist. That can happen today, but the
> patch creates new ways to make it happen. The symptom is wasting yet more
> space in pg_multixact. I am okay with this, since it arises only after one
> fills pg_multixact 50% full. There are alternatives. We could weaken the
> corruption defense in TruncateMultiXact() or look for another implementation
> of equivalent defense. We could unlink, say, 75% or 95% of the "past" instead
> of 50% (this patch) or >99.99% (today's behavior).

Rebased the second patch. The first patch did not need a rebase.

Attachment Content-Type Size
slru-truncate-modulo-v3.patch text/plain 28.3 KB
slru-truncate-insurance-v2.patch text/plain 39.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-08-30 07:05:48 Re: More tests with USING INDEX replident and dropped indexes
Previous Message Thomas Munro 2020-08-30 03:51:51 Rare link canary failure in dblink test