race condition when writing pg_control

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: race condition when writing pg_control
Date: 2020-05-04 17:44:21
Message-ID: 70BF24D6-DC51-443F-B55A-95735803842A@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I believe I've discovered a race condition between the startup and
checkpointer processes that can cause a CRC mismatch in the pg_control
file. If a cluster crashes at the right time, the following error
appears when you attempt to restart it:

FATAL: incorrect checksum in control file

This appears to be caused by some code paths in xlog_redo() that
update ControlFile without taking the ControlFileLock. The attached
patch seems to be sufficient to prevent the CRC mismatch in the
control file, but perhaps this is a symptom of a bigger problem with
concurrent modifications of ControlFile->checkPointCopy.nextFullXid.

Nathan

Attachment Content-Type Size
v1-0001-Prevent-race-condition-when-writing-pg_control.patch application/octet-stream 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-05-04 18:04:32 Re: design for parallel backup
Previous Message Tom Lane 2020-05-04 15:28:37 Re: do {} while (0) nitpick