|From:||Daniel Gustafsson <daniel(at)yesql(dot)se>|
|To:||Heikki Linnakangas <hlinnaka(at)iki(dot)fi>|
|Cc:||Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org|
|Subject:||Re: Online checksums patch - once again|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> On 25 Nov 2020, at 14:33, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> The lwlocking doesn't look right here. If ControlFile->data_checksum_version != PG_DATA_CHECKSUM_VERSION, LWLockAcquire is called twice without a LWLockRelease in between.
> What if a checkpoint, and a crash, happens just after the WAL record has been written, but before the control file is updated? That's a ridiculously tight window for a whole checkpoint cycle to happen, but in principle I think that would spell trouble. I think you could set delayChkpt to prevent the checkpoint from happening in that window, similar to how we avoid this problem with the clog updates at commit. Also, I think this should be in a critical section; we don't want the process to error out in between for any reason, and if it does happen, it's panic time.
Good points. The attached patch performs the state changes inside a critical
section with checkpoints delayed, as well as emit the barrier inside the
critical section while awaiting the barrier outside to keep it open as short as
I've also done some tweaks to the tests to make them more robust as well as
comment updates and general tidying up here and there.
|Next Message||Krunal Bauskar||2020-12-03 09:49:41||Re: Improving spin-lock implementation on ARM.|
|Previous Message||Julien Rouhaud||2020-12-03 09:31:43||REINDEX backend filtering|