Re: SimpleLruTruncate() mutual exclusion

From: Noah Misch <noah(at)leadboat(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SimpleLruTruncate() mutual exclusion
Date: 2019-11-18 06:14:26
Message-ID: 20191118061426.GA1173376@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 17, 2019 at 12:56:52PM +0100, Dmitry Dolgov wrote:
> > On Wed, Jul 31, 2019 at 11:51:17PM -0700, Noah Misch wrote:
> > > Also, if I understand the data-loss hazard properly, it's what you
> > > said in the other thread: the latest_page_number could advance after
> > > we make our decision about what to truncate, and then maybe we could
> > > truncate newly-added data. We surely don't want to lock out the
> > > operations that can advance last_page_number, so how does serializing
> > > vac_truncate_clog help?
> > >
> > > I wonder whether what we need to do is add some state in shared
> > > memory saying "it is only safe to create pages before here", and
> > > make SimpleLruZeroPage or its callers check that. The truncation
> > > logic would advance that value, but only after completing a scan.
> > > (Oh ..., hmm, maybe the point of serializing truncations is to
> > > ensure that we know that we can safely advance that?)
> >
> > vac_truncate_clog() already records "it is only safe to create pages before
> > here" in ShmemVariableCache->xidWrapLimit, which it updates after any unlinks.
> > The trouble comes when two vac_truncate_clog() run in parallel and you get a
> > sequence of events like this:
> >
> > vac_truncate_clog() instance 1 starts, considers segment ABCD eligible to unlink
> > vac_truncate_clog() instance 2 starts, considers segment ABCD eligible to unlink
> > vac_truncate_clog() instance 1 unlinks segment ABCD
> > vac_truncate_clog() instance 1 calls SetTransactionIdLimit()
> > vac_truncate_clog() instance 1 finishes
> > some backend calls SimpleLruZeroPage(), creating segment ABCD
> > vac_truncate_clog() instance 2 unlinks segment ABCD
> >
> > Serializing vac_truncate_clog() fixes that.
>
> I'm probably missing something, so just wanted to clarify. Do I
> understand correctly, that thread [1] and this one are independent, and
> it is assumed in the scenario above that we're at the end of XID space,
> but not necessarily having rounding issues? I'm a bit confused, since
> the reproduce script does something about cutoffPage, and I'm not sure
> if it's important or not.

I think the repro recipe contained an early fix for the other thread's bug.
While they're independent in principle, I likely never reproduced this bug
without having a fix in place for the other thread's bug. The bug in the
other thread was easier to hit.

> > > Can you post whatever script you used to detect/reproduce the problem
> > > in the first place?
> >
> > I've attached it, but it's pretty hackish. Apply the patch on commit 7170268,
> > make sure your --bindir is in $PATH, copy "conf-test-pg" to your home
> > directory, and run "make trunc-check". This incorporates xid-burn
> > acceleration code that Jeff Janes shared with -hackers at some point.
>
> I've tried to use it to understand the problem better, but somehow
> couldn't reproduce via suggested script. I've applied it to 7170268
> (tried also apply rebased to the master with the same results) with the
> conf-test-pg in place, but after going through all steps there are no
> errors like:

That is unfortunate.

> Is there anything extra one needs to do to reproduce the problem, maybe
> adjust delays or something?

It wouldn't surprise me. I did all my testing on one or two systems; the
hard-coded delays suffice there, but I'm sure there exist systems needing
different delays.

Though I did reproduce this bug, I'm motivated by the abstract problem more
than any particular way to reproduce it. Commit 996d273 inspired me; by
removing a GetCurrentTransactionId(), it allowed the global xmin to advance at
times it previously could not. That subtly changed the concurrency
possibilities. I think safe, parallel SimpleLruTruncate() is difficult to
maintain and helps too rarely to justify such maintenance. That's why I
propose eliminating the concurrency.

> [1]: https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-11-18 06:34:36 Re: [HACKERS] Block level parallel vacuum
Previous Message Masahiko Sawada 2019-11-18 06:06:41 Re: [HACKERS] Block level parallel vacuum