Re: Checkpoint cost, looks like it is WAL/CRC

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, josh(at)agliodbs(dot)com, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checkpoint cost, looks like it is WAL/CRC
Date: 2005-07-07 16:22:48
Message-ID: 200507071622.j67GMmf19838@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
> > Well, I added #1 yesterday as 'full_page_writes', and it has the same
> > warnings as fsync (namely, on crash, be prepared to recovery or check
> > your system thoroughly.
>
> Yes, which is why I comment now that the GUC alone is not enough.
>
> There is no way to "check your system thoroughly". If there is a certain
> way of knowing torn pages had *not* occurred, then I would be happy.

Yep, it is a pain, and like fsync.

> > As far as #2, my posted proposal was to write the full pages to WAL when
> > they are written to the file system, and not when they are first
> > modified in the shared buffers --- the goal being that it will even out
> > the load, and it will happen in a non-critical path, hopefully by the
> > background writer or at checkpoint time.
>
> The page must be written before the changes to the page are written, so
> that they are available sequentially in the log for replay. The log and
> the database are not connected, so we cannot do it that way. If the page
> is written out of sequence from the changes to it, how would recovery
> know where to get the page from?

See my later email --- the full page will be restored later from WAL, so
our changes don't have to be made at that point.

> ISTM there is mileage in your idea of trying to shift the work to
> another time. My thought is "which blocks exactly are the ones being
> changed?". Maybe that would lead to a reduction.
>
> > > With wal_changed_pages= off *any* crash would possibly require an
> > > archive recovery, or a replication rebuild. It's good that we now have
> > > PITR, but we do also have other options for availability. Users of
> > > replication could well be amongst the first to try out this option.
> >
> > Seems it is similar to fsync in risk, which is not a new option.
>
> Risk is not acceptable. We must have certainty, either way.
>
> Why have two GUCs? Why not just have one GUC that does both at the same
> time? When would you want one but not the other?
> risk_data_loss_to_gain_performance = true

Yep, one new one might make sense.

> > I think if we document full_page_writes as similar to fsync in risk, we
> > are OK for 8.1, but if something can be done easily, it sounds good.
>
> Documenting something simply isn't enough. I simply cannot advise
> anybody ever to use the new GUC. If their data was low value, they
> wouldn't even be using PostgreSQL, they'd use a non-transactional DBMS.
>
> I agree we *must* have the GUC, but we also *must* have a way for crash
> recovery to tell us for certain that it has definitely worked, not just
> maybe worked.

Right. I am thinking your CRC write to WAL might do that.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hans-Jürgen Schönig 2005-07-07 16:23:40 Re: A couple of patches for PostgreSQL 64bit support
Previous Message Bruce Momjian 2005-07-07 16:11:05 Re: Checkpoint cost, looks like it is WAL/CRC