Re: Possible corruption by CreateRestartPoint at promotion

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: nathandbossart(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, masao(dot)fujii(at)oss(dot)nttdata(dot)com
Subject: Re: Possible corruption by CreateRestartPoint at promotion
Date: 2022-05-09 00:24:06
Message-ID: YnhfJir8WYRIWgAs@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 06, 2022 at 07:58:43PM +0900, Michael Paquier wrote:
> And I have spent a bit of this stuff to finish with the attached. It
> will be a plus to get that done on HEAD for beta1, so I'll try to deal
> with it on Monday. I am still a bit stressed about the back branches
> as concurrent checkpoints are possible via CreateCheckPoint() from the
> startup process (not the case of HEAD), but the stable branches will
> have a new point release very soon so let's revisit this choice there
> later. v6 attached includes a TAP test, but I don't intend to include
> it as it is expensive.

Okay, applied this one on HEAD after going back-and-forth on it for
the last couple of days. I have found myself shaping the patch in
what looks like its simplest form, by applying the check based on an
older checkpoint to all the fields updated in the control file, with
the check on DB_IN_ARCHIVE_RECOVERY applying to the addition of
DB_SHUTDOWNED_IN_RECOVERY (got initialially surprised that this was
having side effects on pg_rewind) and the minRecoveryPoint
calculations.

Now, it would be nice to get a test for this stuff, and we are going
to need something cheaper than what's been proposed upthread. This
comes down to the point of being able to put a deterministic stop
in a restart point while it is processing, meaning that we need to
interact with one of the internal routines of CheckPointGuts(). One
fancy way to do so would be to forcibly take a LWLock to stuck the
restart point until it is released. Using a SQL function for that
would be possible, if not artistic. Perhaps we don't need such a
function though, if we could stuck arbitrarily the internals of a
checkpoint? Any ideas?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-05-09 00:28:14 Re: Cryptohash OpenSSL error queue in FIPS enabled builds
Previous Message Justin Pryzby 2022-05-09 00:01:08 should check interrupts in BuildRelationExtStatistics ?