Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date: 2020-01-05 01:19:16
Message-ID: 20200105011916.GC2270238@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote:
> On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
> >a. Fix the rounding in SimpleLruTruncate(). (The patch I posted upthread is
> > wrong; I will correct it in a separate message.)
> >
> >b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
> > AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
> > checkpoint, or in the startup process. Hence, also arrange for only one
> > backend to call SimpleLruTruncate(AsyncCtl) at a time.
> >
> >c. Test "apparent wraparound" before writing WAL, and don't WAL-log
> > truncations that "apparent wraparound" forces us to skip.
> >
> >d. Hold the ControlLock for the entirety of SimpleLruTruncate(). This removes
> > the TOCTOU race condition, but TransactionIdDidCommit() and other key
> > operations would be waiting behind filesystem I/O.

> >With both (a) and (b), the only way I'd know to reach the "apparent
> >wraparound" message is to restart in single-user mode and burn XIDs to the
> >point of bona fide wraparound. Hence, I propose to back-patch (a) and (b),
> >and I propose (c) for HEAD only. I don't want (d), which threatens
> >performance too much. I would rather not have (e), because I expect it's more
> >complex than (b) and fixes strictly less than (b) fixes.

> Seems reasonable, although I wonder how much more expensive would just
> doing (d) be. It seems by far the least complex solution, and it moves
> "just" the SlruScanDirectory() call before the lock. It's true it adds
> I/O requests, OTOH it's just unlink() without fsync() and I'd expect the
> number of files to be relatively low. Plus we already do SimpleLruWaitIO
> and SlruInternalWritePage in the loop.

Trivial read-only transactions often need CLogControlLock to check tuple
visibility. If an unlink() takes 1s, stalling read-only transactions for that
1s is a big problem. SimpleLruWaitIO() and SlruInternalWritePage() release
the control lock during I/O, then re-acquire it. (Moreover,
SimpleLruTruncate() rarely calls them. Calling them implies a page old enough
to truncate was modified after the most recent checkpoint.)

> BTW isn't that an issue that SlruInternalDeleteSegment does not do any
> fsync calls after unlinking the segments? If the system crashes/reboots
> before this becomes persistent (i.e. some of the segments reappear,
> won't that cause a problem)?

I think not; we could turn SlruInternalDeleteSegment() into a no-op, and the
only SQL-visible consequence would be extra disk usage. CheckPoint fields
tell the server what region of slru data is meaningful, and segments outside
that range merely waste disk space. (If that's not true and can't be made
true, we'd also need to stop ignoring the unlink() return value.)

> It's a bit unfortunate that a patch for a data corruption / loss issue
> (even if a low-probability one) fell through multiple commitfests.

Thanks for investing in steps to fix that.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-05 01:57:28 Re: pgsql: Add basic TAP tests for psql's tab-completion logic.
Previous Message Tomas Vondra 2020-01-05 01:10:34 Re: parallel vacuum options/syntax