Re: Online checksums patch - once again

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Online checksums patch - once again
Date: 2020-12-03 09:37:58
Message-ID: 4D2BC45F-CAE9-451C-AD08-FDB199008E6D@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 25 Nov 2020, at 14:33, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> The lwlocking doesn't look right here. If ControlFile->data_checksum_version != PG_DATA_CHECKSUM_VERSION, LWLockAcquire is called twice without a LWLockRelease in between.

Right, fixed.

> What if a checkpoint, and a crash, happens just after the WAL record has been written, but before the control file is updated? That's a ridiculously tight window for a whole checkpoint cycle to happen, but in principle I think that would spell trouble. I think you could set delayChkpt to prevent the checkpoint from happening in that window, similar to how we avoid this problem with the clog updates at commit. Also, I think this should be in a critical section; we don't want the process to error out in between for any reason, and if it does happen, it's panic time.

Good points. The attached patch performs the state changes inside a critical
section with checkpoints delayed, as well as emit the barrier inside the
critical section while awaiting the barrier outside to keep it open as short as
possible.

I've also done some tweaks to the tests to make them more robust as well as
comment updates and general tidying up here and there.

cheers ./daniel

Attachment Content-Type Size
online_checksums25.patch application/octet-stream 127.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Krunal Bauskar 2020-12-03 09:49:41 Re: Improving spin-lock implementation on ARM.
Previous Message Julien Rouhaud 2020-12-03 09:31:43 REINDEX backend filtering