Re: 16-bit page checksums for 9.2

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: <simon(at)2ndquadrant(dot)com>,<ants(dot)aasma(at)eesti(dot)ee>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <jeff(dot)janes(at)gmail(dot)com>, <aidan(at)highrise(dot)ca>, <stark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 18:32:50
Message-ID: 4F0446F20200002500044382@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> we only fsync() at end-of-checkpoint. So we'd have to think about
> what to fsync, and how often, to keep the double-write buffer to a
> manageable size.

I think this is the big tuning challenge with this technology.

> I can't help thinking that any extra fsyncs are pretty expensive,
> though, especially if you have to fsync() every file that's been
> double-written before clearing the buffer. Possibly we could have
> 2^N separate buffers based on an N-bit hash of the relfilenode and
> segment number, so that we could just fsync 1/(2^N)-th of the open
> files at a time.

I'm not sure I'm following -- we would just be fsyncing those files
we actually wrote pages into, right? Not all segments for the table
involved?

> But even that sounds expensive: writing back lots of dirty data
> isn't cheap. One of the systems I've been doing performance
> testing on can sometimes take >15 seconds to write a shutdown
> checkpoint,

Consider the relation-file fsyncs for double-write as a form of
checkpoint spreading, and maybe it won't seem so bad. It should
make that shutdown checkpoint less painful. Now, I have been
thinking that on a write-heavy system you had better have a BBU
write-back cache, but that's my recommendation, anyway.

> and I'm sure that other people have similar (and worse) problems.

Well, I have no doubt that this feature should be optional. Those
who prefer can continue to do full-page writes to the WAL, instead.
Or take the "running with scissors" approach.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-04 18:44:07 Re: Setting -Werror in CFLAGS
Previous Message Robert Haas 2012-01-04 18:30:09 Re: [RFC] grants vs. inherited tables