Re: Checkpoint cost, looks like it is WAL/CRC

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, josh(at)agliodbs(dot)com, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checkpoint cost, looks like it is WAL/CRC
Date: 2005-07-06 22:52:11
Message-ID: 1120690332.3940.206.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
> Well, I added #1 yesterday as 'full_page_writes', and it has the same
> warnings as fsync (namely, on crash, be prepared to recovery or check
> your system thoroughly.

Yes, which is why I comment now that the GUC alone is not enough.

There is no way to "check your system thoroughly". If there is a certain
way of knowing torn pages had *not* occurred, then I would be happy.

> As far as #2, my posted proposal was to write the full pages to WAL when
> they are written to the file system, and not when they are first
> modified in the shared buffers --- the goal being that it will even out
> the load, and it will happen in a non-critical path, hopefully by the
> background writer or at checkpoint time.

The page must be written before the changes to the page are written, so
that they are available sequentially in the log for replay. The log and
the database are not connected, so we cannot do it that way. If the page
is written out of sequence from the changes to it, how would recovery
know where to get the page from?

ISTM there is mileage in your idea of trying to shift the work to
another time. My thought is "which blocks exactly are the ones being
changed?". Maybe that would lead to a reduction.

> > With wal_changed_pages= off *any* crash would possibly require an
> > archive recovery, or a replication rebuild. It's good that we now have
> > PITR, but we do also have other options for availability. Users of
> > replication could well be amongst the first to try out this option.
>
> Seems it is similar to fsync in risk, which is not a new option.

Risk is not acceptable. We must have certainty, either way.

Why have two GUCs? Why not just have one GUC that does both at the same
time? When would you want one but not the other?
risk_data_loss_to_gain_performance = true

> I think if we document full_page_writes as similar to fsync in risk, we
> are OK for 8.1, but if something can be done easily, it sounds good.

Documenting something simply isn't enough. I simply cannot advise
anybody ever to use the new GUC. If their data was low value, they
wouldn't even be using PostgreSQL, they'd use a non-transactional DBMS.

I agree we *must* have the GUC, but we also *must* have a way for crash
recovery to tell us for certain that it has definitely worked, not just
maybe worked.

Best regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2005-07-07 00:17:23 Re: Checkpoint cost, looks like it is WAL/CRC
Previous Message Bruce Momjian 2005-07-06 22:38:30 Re: Schedule for 8.1 feature freeze