Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date: 2020-05-25 07:00:33
Message-ID: 20200525070033.GA1591335@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 06, 2020 at 09:18:47PM -0700, Noah Misch wrote:
> On Mon, Apr 06, 2020 at 02:46:09PM -0400, Tom Lane wrote:
> > Noah Misch <noah(at)leadboat(dot)com> writes:
> > > On Wed, Mar 25, 2020 at 04:42:31PM -0400, Tom Lane wrote:
> > >> So I think what we're actually trying to accomplish here is to
> > >> ensure that instead of deleting up to half of the SLRU space
> > >> before the cutoff, we delete up to half-less-one-segment.
> > >> Maybe it should be half-less-two-segments, just to provide some
> > >> cushion against edge cases. Reading the first comment in
> > >> SetTransactionIdLimit makes one not want to trust too much in
> > >> arguments based on the exact value of xidWrapLimit, while for
> > >> the other SLRUs it was already unclear whether the edge cases
> > >> were exactly right.
> >
> > > That could be interesting insurance. While it would be sad for us to miss an
> > > edge case and print "must be vacuumed within 2 transactions" when wrap has
> > > already happened, reaching that message implies the DBA burned ~1M XIDs, all
> > > in single-user mode. More plausible is FreezeMultiXactId() overrunning the
> > > limit by tens of segments. Hence, if we do buy this insurance, let's skip far
> > > more segments. For example, instead of unlinking segments representing up to
> > > 2^31 past XIDs, we could divide that into an upper half that we unlink and a
> > > lower half. The lower half will stay in place; eventually, XID consumption
> > > will overwrite it. Truncation behavior won't change until the region of CLOG
> > > for pre-oldestXact XIDs exceeds 256 MiB. Beyond that threshold,
> > > vac_truncate_clog() will unlink the upper 256 MiB and leave the rest. CLOG
> > > maximum would rise from 512 MiB to 768 MiB. Would that be worthwhile?

> > Temporarily wasting some disk
> > space is a lot more palatable than corrupting data, and these code
> > paths are necessarily not terribly well tested. So +1 for more
> > insurance.
>
> Okay, I'll give that a try. I expect this will replace the PagePrecedes
> callback with a PageDiff callback such that PageDiff(a, b) < 0 iff
> PagePrecedes(a, b). PageDiff callbacks shall distribute return values
> uniformly in [INT_MIN,INT_MAX]. SimpleLruTruncate() will unlink segments
> where INT_MIN/2 < PageDiff(candidate, cutoff) < 0.

While doing so, I found that slru-truncate-modulo-v2.patch did get edge cases
wrong, as you feared. In particular, if the newest XID reached xidStopLimit
and was in the first page of a segment, TruncateCLOG() would delete its
segment. Attached slru-truncate-modulo-v3.patch fixes that; as restitution, I
added unit tests covering that and other scenarios. Reaching the bug via XIDs
was hard, requiring one to burn 1000k-CLOG_XACTS_PER_PAGE=967k XIDs in
single-user mode. I expect the bug was easier to reach via pg_multixact.

The insurance patch stacks on top of the bug fix patch. It does have a
negative effect on TruncateMultiXact(), which uses SlruScanDirCbFindEarliest
to skip truncation in corrupted clusters. SlruScanDirCbFindEarliest() gives
nonsense answers if "future" segments exist. That can happen today, but the
patch creates new ways to make it happen. The symptom is wasting yet more
space in pg_multixact. I am okay with this, since it arises only after one
fills pg_multixact 50% full. There are alternatives. We could weaken the
corruption defense in TruncateMultiXact() or look for another implementation
of equivalent defense. We could unlink, say, 75% or 95% of the "past" instead
of 50% (this patch) or >99.99% (today's behavior).

Thanks,
nm

Attachment Content-Type Size
slru-truncate-modulo-v3.patch text/plain 28.3 KB
slru-truncate-insurance-v1.patch text/plain 39.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-05-25 07:03:46 Re: pg13 docs: minor fix for "System views" list
Previous Message Michael Paquier 2020-05-25 06:57:35 Re: PostgresSQL 13.0 Beta 1 on Phoronix