Re: race condition when writing pg_control

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition when writing pg_control
Date: 2020-05-31 21:11:35
Message-ID: A9F111A3-560A-4C21-A35B-9D3182704187@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/29/20, 12:24 AM, "Fujii Masao" <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> On 2020/05/27 16:10, Michael Paquier wrote:
>> On Tue, May 26, 2020 at 07:30:54PM +0000, Bossart, Nathan wrote:
>>> While an assertion in UpdateControlFile() would not have helped us
>>> catch the problem I initially reported, it does seem worthwhile to add
>>> it. I have attached a patch that adds this assertion and also
>>> attempts to fix XLogReportParameters(). Since there is only one place
>>> where we feel it is safe to call UpdateControlFile() without a lock, I
>>> just changed it to take the lock. I don't think this adds any sort of
>>> significant contention risk, and IMO it is a bit cleaner than the
>>> boolean flag.
>>
>> Let's see what Fujii-san and Thomas think about that. I'd rather
>> avoid taking a lock here because we don't need it and because it makes
>> things IMO confusing with the beginning of StartupXLOG() where a lot
>> of the fields are read, even if we go without this extra assertion.
>
> I have no strong opinion about this, but I tend to agree with Michael here.
>
>>> For the XLogReportParameters() fix, I simply added an exclusive lock
>>> acquisition for the portion that updates the values in shared memory
>>> and calls UpdateControlFile(). IIUC the first part of this function
>>> that accesses several ControlFile values should be safe, as none of
>>> them can be updated after server start.
>>
>> They can get updated when replaying a XLOG_PARAMETER_CHANGE record.
>> But you are right as all of this happens in the startup process, so
>> your patch looks right to me here.
>
> LGTM.

Thanks for the feedback. I've attached a new set of patches.

Nathan

Attachment Content-Type Size
0003-Assert-that-ControlFileLock-is-held-within-UpdateCon.patch application/octet-stream 5.9 KB
0002-Acquire-ControlFileLock-within-XLogReportParameters.patch application/octet-stream 1.4 KB
0001-Fix-race-condition-that-could-corrupt-pg_control.patch application/octet-stream 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-05-31 21:47:01 Re: Incorrect comment in be-secure-openssl.c
Previous Message Vik Fearing 2020-05-31 20:02:29 Re: Compatible defaults for LEAD/LAG