Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, 2009-11-30 at 16:49 -0500, Aidan Van Dyk wrote:
>> No, I believe the torn-page problem is exactly the thing that made the
>> checksum talks stall out last time... The torn page isn't currently a
>> problem on only-hint-bit-dirty writes, because if you get
>> half-old/half-new, the only changes is the hint bit - no big loss, the
>> data is still the same.
> A good argument, but we're missing some proportion.
No, I think you are. The problem with the described behavior is exactly
that it converts a non-problem into a problem --- a big problem, in
fact: uncorrectable data loss. Loss of hint bits is expected and
tolerated in the current system design. But a block with bad CRC is not
going to have any automated recovery path.
So the difficulty is that in the name of improving system reliability
by detecting infrequent corruption events, we'd be decreasing system
reliability by *creating* infrequent corruption events, added onto
whatever events we were hoping to detect. There is no strong argument
you can make that this isn't a net loss --- you'd need to pull some
error-rate numbers out of the air to even try to make the argument,
and in any case the fact remains that more data gets lost with the CRC
than without it. The only thing the CRC is really buying is giving
the PG project a more plausible argument for blaming data loss on
somebody else; it's not helping the user whose data got lost.
It's hard to justify the amount of work and performance hit we'd take
to obtain a "feature" like that.
regards, tom lane
In response to
pgsql-hackers by date
|Next:||From: Bruce Momjian||Date: 2009-12-01 01:11:35|
|Subject: Re: ProcessUtility_hook|
|Previous:||From: Andres Freund||Date: 2009-12-01 00:26:32|
|Subject: Re: Application name patch - v4|