On Tue, 2006-10-24 at 14:52 +0100, Simon Riggs wrote:
> On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > > On Mon, 2006-10-23 at 15:12 -0400, Tom Lane wrote:
> > >> There are actually three checks used to detect end of WAL: zero record
> > >> length, invalid checksum, and incorrect back-pointer. Zero length is
> > >> the first and cleanest-looking test, but AFAICS we have to have both of
> > >> the others to avoid obvious failure modes.
> > > The checksum protects from torn pages and disk errors. If you have
> > > full_page_writes set then you already believe yourself safe from torn
> > > pages and your disks could also already be CRC-checking the data.
> > No, because unlike tuples, WAL records can and do cross page boundaries.
> But not that often, with full_page_writes = off. So we could get away
> with just CRC checking the page-spanning ones and mark the records to
> show whether they have been CRC checked or not and need to be rechecked
> at recovery time. That would reduce the CRC overhead to about 1-5% of
> what it is now (as an option).
Looking further, I see that the xlog page header already contains
xlp_pageaddr which is a XLogRecPtr. So an xlrec that tried to span
multiple pages yet failed in between would easily show up as a failure
in ValidXLOGHeader(), even before the CRC check. [The XLogRecPtr
contains both the offset within the file and a unique identification of
the WAL file, so any data left over from previous uses of that data file
will be easily recognised as such].
So we don't even need to CRC check the page-spanning ones either.
So it all comes down to: do you trust your hardware? I accept that some
hardware/OS combinations will give you high risk, others will reduce it
considerably. All I'm looking to do is to pass on the savings for people
that are confident they have invested wisely in hardware.
Does anybody have a reasonable objection to introducing an option for
people that are comfortable they are making the correct choice?
wal_checksum = off (or other suggested naming...)
I don't want to take blind risks, so shoot me down, please, if I err.
I'm happy either way: either we speed up, or we're safer not to.
In response to
pgsql-hackers by date
|Next:||From: Alvaro Herrera||Date: 2006-10-24 17:51:47|
|Subject: Re: New CRC algorithm: Slicing by 8|
|Previous:||From: Alvaro Herrera||Date: 2006-10-24 17:47:34|
|Subject: Re: Incorrect behavior with CE and ORDER BY|