Re: SimpleLruTruncate() mutual exclusion

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SimpleLruTruncate() mutual exclusion
Date: 2019-08-01 06:51:17
Message-ID: 20190801065117.GA2355684@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 29, 2019 at 12:58:17PM -0400, Tom Lane wrote:
> > On Sun, Feb 17, 2019 at 11:31:03PM -0800, Noah Misch wrote:
> >>> b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
> >>> AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
> >>> checkpoint, or in the startup process. Hence, also arrange for only one
> >>> backend to call SimpleLruTruncate(AsyncCtl) at a time.
>
> >> More specifically, restrict vac_update_datfrozenxid() to one backend per
> >> database, and restrict vac_truncate_clog() and asyncQueueAdvanceTail() to one
> >> backend per cluster. This, attached, was rather straightforward.

> I'm pretty sure it was intentional that multiple backends
> could run truncation directory scans concurrently, and I don't really
> want to give that up if possible.

I want to avoid a design that requires painstaking concurrency analysis. Such
analysis is worth it for heap_update(), but vac_truncate_clog() isn't enough
of a hot path. If there's a way to make vac_truncate_clog() easy to analyze
and still somewhat concurrent, fair.

> Also, if I understand the data-loss hazard properly, it's what you
> said in the other thread: the latest_page_number could advance after
> we make our decision about what to truncate, and then maybe we could
> truncate newly-added data. We surely don't want to lock out the
> operations that can advance last_page_number, so how does serializing
> vac_truncate_clog help?
>
> I wonder whether what we need to do is add some state in shared
> memory saying "it is only safe to create pages before here", and
> make SimpleLruZeroPage or its callers check that. The truncation
> logic would advance that value, but only after completing a scan.
> (Oh ..., hmm, maybe the point of serializing truncations is to
> ensure that we know that we can safely advance that?)

vac_truncate_clog() already records "it is only safe to create pages before
here" in ShmemVariableCache->xidWrapLimit, which it updates after any unlinks.
The trouble comes when two vac_truncate_clog() run in parallel and you get a
sequence of events like this:

vac_truncate_clog() instance 1 starts, considers segment ABCD eligible to unlink
vac_truncate_clog() instance 2 starts, considers segment ABCD eligible to unlink
vac_truncate_clog() instance 1 unlinks segment ABCD
vac_truncate_clog() instance 1 calls SetTransactionIdLimit()
vac_truncate_clog() instance 1 finishes
some backend calls SimpleLruZeroPage(), creating segment ABCD
vac_truncate_clog() instance 2 unlinks segment ABCD

Serializing vac_truncate_clog() fixes that.

> Can you post whatever script you used to detect/reproduce the problem
> in the first place?

I've attached it, but it's pretty hackish. Apply the patch on commit 7170268,
make sure your --bindir is in $PATH, copy "conf-test-pg" to your home
directory, and run "make trunc-check". This incorporates xid-burn
acceleration code that Jeff Janes shared with -hackers at some point.

> PS: also, re-reading this code, it's worrisome that we are not checking
> for failure of the unlink calls. I think the idea was that it didn't
> really matter because if we did re-use an existing file we'd just
> re-zero it; hence failing the truncate is an overreaction. But probably
> some comments about that are in order.

That's my understanding. We'll zero any page before reusing it. Failing the
whole vac_truncate_clog() (and therefore not advancing the wrap limit) would
do more harm than the bit of wasted disk space. Still, a LOG or WARNING
message would be fine, I think.

Thanks,
nm

Attachment Content-Type Size
repro-vac_truncate_clog-race-v0.patch text/plain 13.9 KB
conf-test-pg text/plain 832 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-08-01 06:52:52 Re: pgbench - implement strict TPC-B benchmark
Previous Message Thomas Munro 2019-08-01 06:49:49 Re: Contribution to Perldoc for TestLib module in Postgres