|From:||Daniel Gustafsson <daniel(at)yesql(dot)se>|
|To:||Heikki Linnakangas <hlinnaka(at)iki(dot)fi>|
|Cc:||Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: Online checksums patch - once again|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> On 9 Feb 2021, at 09:54, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> (I may have said this before, but) My overall high-level impression of this patch is that it's really cmmplex for a feature that you use maybe once in the lifetime of a cluster.
The frequency of using a feature seems like a suboptimal metric of the value
and usefulness of it. I don't disagree that this patch is quite invasive and
complicated though, that's a side-effect of performing global state changes in
a distributed system which is hard to get around.
As was discussed downthread, some of the complexity stems from restartability
and the catalog state persistence. The attached version does away with this to
see how big the difference is. Personally I don't think it's enough to move
the needle but mmv.
> I'm happy to review but I'm not planning to commit this myself.
Regardless, thanks for all review and thoughtful comments!
>> doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
> Why does this use DataChecksumsOnInProgress() instead of DataChecksumsNeedWrite()? If checksums are enabled, you always need full-page writes, don't you? If not, then why is it needed in the inprogress-on state?
Correct, that's a thinko, it should be DataChecksumsNeedWrite.
> We also set doPageWrites in InitXLOGAccess(). That should match the condition above (although it doesn't matter for correctness).
Good point, fixed.
>> I think you need to hold the interrupts *before* the smgrread() call.
>> * Set checksum for a page in private memory.
>> * This must only be used when we know that no other process can be modifying
>> * the page buffer.
>> PageSetChecksumInplace(Page page, BlockNumber blkno)
>> /* If we don't need a checksum, just return */
>> if (PageIsNew(page) || !DataChecksumsNeedWrite())
>> ((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
> The checksums state might change just after this call, before the caller has actually performed the smgrwrite() or smgrextend() call. The caller needs to hold interrupts across this function and the smgrwrite/smgrextend() call. It is a bad idea to HOLD_INTERRUPTS() here, because that just masks bugs where the caller isn't holding the interrupts. Same in PageSetChecksumCopy().
Looking at the cases which could happen here in case of the below state change:
B <---- state change
The possible state transitions which can happen, and the consequences of them
1) B = (off -> inprogress-on): We will write without a checksum. This relation
will then be processed by the DatachecksumsWorker. Readers will not verify
2) B = (inprogress-on -> on): We will write with a checksum. Both those states
write checksums. Readers will verify the checksum.
3) B = (on -> inprogress-off): We will write with a checksum. Both these
states write checksums. Readers will not verify checksums.
4) B = (inprogress-off -> off): We will write with a checksum which is wasteful
and strictly speaking incorrect. Readers will not verify checksums.
The above assume there is only a single state transition between the call to
PageSetChecksumInPlace and smgrwrite/extend, which of course isn't guaranteed
to be the case (albeit a lot more likely than two).
The (off -> inprogress-on -> on) and (inprogress-on -> on -> inprogress-off)
transitions cannot happen here since the DatachecksumWorker will wait for
ongoing transactions before a transition to on can be initiated.
11) B = (on -> inprogress-off -> off): the checksum will be written when it
shouldn't be, but readers won't verify it.
22) B = (inprogress-off -> off -> inprogress-on): checksums are written without
being verified in both these states.
So these cornercases can happen but ending up with an incorrect verification is
likely hard, which is probably why they haven't shook out during testing. I've
updated to patch to hold interrupts across the smgrwrite/extend call.
Daniel Gustafsson https://vmware.com/
|Next Message||er||2021-02-15 12:44:51||Re: logical replication seems broken|
|Previous Message||vignesh C||2021-02-15 12:06:38||Re: logical replication seems broken|