|From:||David Steele <david(at)pgmasters(dot)net>|
|To:||Michael Banck <michael(dot)banck(at)credativ(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org|
|Cc:||Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>|
|Subject:||Re: [PATCH] Verify Checksums during Basebackups|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 3/23/18 5:36 AM, Michael Banck wrote:
> Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:
>> + if (phdr->pd_checksum != checksum)
>> I've attached a patch that adds basic retry functionality. It's not
>> terrible efficient since it rereads the entire buffer for any block
>> error. A better way is to keep a bitmap for each block in the buffer,
>> then on retry compare bitmaps. If the block is still bad, report it.
>> If the block was corrected moved on. If a block was good before but is
>> bad on retry it can be ignored.
> I have to admit I find it a bit convoluted and non-obvious on first
> reading, but I'll try to check it out some more.
Yeah, I think I was influenced too much by how pgBackRest does things,
which doesn't work as well here. Attached is a simpler version.
> I think it would be very useful if we could come up with a testcase
> showing this problem, but I guess this will be quite hard to hit
> reproducibly, right?
This was brought up by Robert in  when discussing validating
checksums during backup. I don't know of any way to reproduce this
issue but it seems perfectly possible, if highly unlikely.
>> + ereport(WARNING,
>> + (errmsg("checksum verification failed in file "
>> I'm worried about how verbose this warning could be since there are
>> 131,072 blocks per segment. It's unlikely to have that many block
>> errors, but users do sometimes put files in PGDATA which look like they
>> should be validated. Since these warnings all go to the server log it
>> could get pretty bad.
> We only verify checksums of files in tablespaces, and I don't think
> dropping random files in those is supported in any way, as opposed to
> maybe the top-level PGDATA directory. So I would say that this is not a
> real concern.
Perhaps, but a very corrupt file is still going to spew lots of warnings
into the server log.
>> We should either stop warning after the first failure, or aggregate the
>> failures for a file into a single message.
> I agree that major corruption could make the whole output blow up but I
> would prefer to keep this feature simple for now, which implies possibly
> printing out a lot of WARNING or maybe just stopping after the first
> one (or first few, dunno).
In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
Maybe stop after five?
|Next Message||Amit Kapila||2018-03-23 14:59:50||Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)|
|Previous Message||Haribabu Kommi||2018-03-23 14:49:28||Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types|