Re: 9.3: summary of corruption detection / checksums / CRCs discussion

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3: summary of corruption detection / checksums / CRCs discussion
Date: 2012-04-21 23:58:31
Message-ID: 1335052711.25680.112.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2012-04-22 at 00:08 +0100, Greg Stark wrote:
> On Sat, Apr 21, 2012 at 10:40 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > * In addition to detecting random garbage, we also need to be able to
> > detect zeroing of pages. Right now, a zero page is not considered
> > corrupt, so that's a problem. We'll need to WAL table extension
> > operations, and we'll need to mitigate the performance impact of doing
> > so. I think we can do that by extending larger tables by many pages
> > (say, 16 at a time) so we can amortize the cost of WAL and avoid
> > contention.
>
> I haven't seen this come up in discussion.

I don't have any links, and it might just be based on in-person
discussions. I think it's just being left as a loose end for later, but
it will eventually need to be solved.

> WAL logging table
> extensions wouldn't by itself work because currently we treat the file
> size on disk as the size of the table. So you would have to do the
> extension in the critical section or else different backends might see
> the wrong file size and write out conflicting wal entries.

By "critical section", I assume you mean "while holding the relation
extension lock" not "while inside a CRITICAL_SECTION()", right?

There would be some synchronization overhead, to be sure, but I think it
can be done. Ideally, we'd be able to do large enough extensions that,
if there is a parallel bulk load on a single table or something, the
overhead could be made insignificant.

I didn't intend to get too much into the detail in this thread, but if
it's a totally ridiculous or impossible idea, I'll remove it.

> The earlier consensus was to move all the hint bits to a dedicated
> area and exclude them from the checksum. I think double-write buffers
> seem to have become more fashionable but a summary that doesn't
> describe the former is definitely incomplete.

Thank you, that's the kind of omission I was looking to catch.

> That link points to the MVCC-safe truncate patch. I don't follow how
> optimizations in bulk loads are relevant to wal logging hint bit
> updates.

I should have linked to these messages:
http://archives.postgresql.org/message-id/CA
+TgmoYLOzDezzJKyJ8_x2bPeEerAo5dJ-OMvS1fLQOQSQP5jg(at)mail(dot)gmail(dot)com
http://archives.postgresql.org/message-id/CA
+Tgmoa4Xs1jbZhm=pb9Xi4AGMJXRB2a4GSE9EJtLo=70Zne=g(at)mail(dot)gmail(dot)com

Though perhaps I'm reading too much into Robert's comments.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2012-04-22 23:59:10 Re: [BUG] Checkpointer on hot standby runs without looking checkpoint_segments
Previous Message Greg Stark 2012-04-21 23:08:42 Re: 9.3: summary of corruption detection / checksums / CRCs discussion