Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Date: 2019-02-14 07:26:23
Message-ID: 20190214072623.GA1139206@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
> The main consequence is the false alarm.

That conclusion was incorrect. On further study, I was able to reproduce data
loss via either of two weaknesses in the "apparent wraparound" test:

1. The result of the test is valid only until we release the SLRU ControlLock,
which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
segments for deletion. Once we release that lock, latest_page_number can
advance. This creates a TOCTOU race condition, allowing excess deletion:

[local] test=# table trunc_clog_concurrency ;
ERROR: could not access status of transaction 2149484247
DETAIL: Could not open file "pg_xact/0801": No such file or directory.

2. By the time the "apparent wraparound" test fires, we've already WAL-logged
the truncation. clog_redo() suppresses the "apparent wraparound" test,
then deletes too much. Startup then fails:

881997 2019-02-10 02:53:32.105 GMT FATAL: could not access status of transaction 708112327
881997 2019-02-10 02:53:32.105 GMT DETAIL: Could not open file "pg_xact/02A3": No such file or directory.
881855 2019-02-10 02:53:32.107 GMT LOG: startup process (PID 881997) exited with exit code 1

Fixes are available:

a. Fix the rounding in SimpleLruTruncate(). (The patch I posted upthread is
wrong; I will correct it in a separate message.)

b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
checkpoint, or in the startup process. Hence, also arrange for only one
backend to call SimpleLruTruncate(AsyncCtl) at a time.

c. Test "apparent wraparound" before writing WAL, and don't WAL-log
truncations that "apparent wraparound" forces us to skip.

d. Hold the ControlLock for the entirety of SimpleLruTruncate(). This removes
the TOCTOU race condition, but TransactionIdDidCommit() and other key
operations would be waiting behind filesystem I/O.

e. Have the SLRU track a "low cutoff" for an ongoing truncation. Initially,
the low cutoff is the page furthest in the past relative to cutoffPage (the
"high cutoff"). If SimpleLruZeroPage() wishes to use a page in the
truncation range, it would acquire an LWLock and increment the low cutoff.
Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
same LWLock and recheck the segment against the latest low cutoff.

With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound. Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only. I don't want (d), which threatens
performance too much. I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.

Can you see a way to improve on that plan? Can you see other bugs of this
nature that this plan does not fix?

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-02-14 07:32:44 Re: replace_text optimization (StringInfo to varlena)
Previous Message Tsunakawa, Takayuki 2019-02-14 07:23:18 RE: Protect syscache from bloating with negative cache entries