Re: COMMIT NOWAIT Performance Option

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Subject: Re: COMMIT NOWAIT Performance Option
Date: 2007-02-28 21:13:08
Message-ID: 87d53uugt7.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:

> Which is, of course, how everyone else does it.

I happen to agree with your conclusion but this line of argument is
exceptionally unconvincing. In fact in this crowd you'll tend to turn people
off and lose people if you say things like that rather than convince anyone of
anything.

> Even pages from the last checkpoint would be a killer.

Hm that's an interesting thought. We only really have to check pages that
would have received a full page write since the last checkpoint. So if we made
turning full page writes off still record the page ids of the pages it *would*
have written then we just need the code that normally replays full page writes
to check the checksum if the page data isn't available.

I can't see how that would be a killer. No matter how large a system you're
talking about you're going to tune checkpoints to be occurring at about the
same interval anyways. So the amount of time the wal replay checksum checking
takes will be more or less constant.

In fact we're already reading in most, if not all, of those pages anyways
since we're replaying wal records that touch them after all. Would we even
have to do anything extra? If we check checksums whenever we read in a page
surely the wal replay code would automatically detect any torn pages without
any special attention.

That also makes it clear just how awful full page writes are for scalability.
As you scale up the system but try to keep checkpoint intervals constant
you're less and less likely to ever see the same page twice between two
checkpoints. So as you scale the system up more and more of the wal will
consist of full page writes.

> All of the databases (Oracle, SQL Server, DB2) have a way to perform a
> database corruption check which does go out and verify all checksums.

Which is pretty poor design. If we implemented a fsck-like tool I would be far
more interested in checking things like "tuples don't overlap" or "hint bits
are set correctly" and so on. Checksums do nothing to protect against software
failures which is the only kind of failure with a good rationale for being in
an external tool.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2007-02-28 21:23:58 Re: SOC & user quotas
Previous Message Tom Lane 2007-02-28 21:04:08 Re: Compilation errors