Re: Enabling Checksums

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enabling Checksums
Date: 2013-01-27 04:23:02
Message-ID: CA+TgmoaBj9t3_mCDuc21m+b3i+yM+vekmaSDSM8vr8jOyuq2QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 25, 2013 at 9:35 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Fri, 2013-01-25 at 15:29 -0500, Robert Haas wrote:
>> I thought Simon had the idea, at some stage, of writing a WAL record
>> to cover hint-bit changes only at the time we *write* the buffer and
>> only if no FPI had already been emitted that checkpoint cycle. I'm
>> not sure whether that approach was sound, but if so it seems more
>> efficient than this approach.
>
> My patch is based on his original idea; although I've made quite a lot
> of changes, I believe that I have stuck to his same basic design w.r.t.
> WAL.
>
> This patch does not cause a new FPI to be emitted if one has already
> been emitted this cycle. It also does not emit a WAL record at all if an
> FPI has already been emitted.
>
> If we were to try to defer writing the WAL until the page was being
> written, the most it would possibly save is the small XLOG_HINT WAL
> record; it would not save any FPIs.

How is the XLOG_HINT_WAL record kept small and why does it not itself
require an FPI?

> At first glance, it seems sound as long as the WAL FPI makes it to disk
> before the data. But to meet that requirement, it seems like we'd need
> to write an FPI and then immediately flush WAL before cleaning a page,
> and that doesn't seem like a win. Do you (or Simon) see an opportunity
> here that I'm missing?

I am not sure that isn't a win. After all, we can need to flush WAL
before flushing a buffer anyway, so this is just adding another case -
and the payoff is that the initial access to a page, setting hint
bits, is quickly followed by a write operation, we avoid the need for
any extra WAL to cover the hint bit change. I bet that's common,
because if updating you'll usually need to look at the tuples on the
page and decide whether they are visible to your scan before, say,
updating one of them

> By the way, the approach I took was to add the heap buffer to the WAL
> chain of the XLOG_HEAP2_VISIBLE wal record when doing log_heap_visible.
> It seemed simpler to understand than trying to add a bunch of options to
> MarkBufferDirty.

Unless I am mistaken, that's going to heavy penalize the case where
the user vacuums an insert-only table. It will emit much more WAL
than currently.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Satoshi Nagayasu 2013-01-27 04:24:27 Re: [PERFORM] pgbench to the MAXINT
Previous Message Robert Haas 2013-01-27 04:18:38 Re: [PATCH] explain tup_fetched/returned in monitoring-stats