Re: COMMIT NOWAIT Performance Option

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Subject: Re: COMMIT NOWAIT Performance Option
Date: 2007-02-28 11:10:04
Message-ID: 87bqjepmgj.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:

> First, rather than using 16-bytes per page and having to deal with
> handling the non-contiguous space, why not just use a page-level
> checksum like everyone else? Most of the systems I've seen seem to
> employ a simple CRC16 or CRC32.

I think a CRC would be a useful feature for people who want an extra degree of
protection from faulty hardware.

But we've already seen that CRC checks can be expensive. Not everyone will
want to take the cpu hit. Storing a byte counter in every block is cheap.

And the idea came from what someone said MSSQL does, so "like everyone else"
-- which isn't a very compelling argument to begin with -- doesn't argue
against it.

> Second, unless I'm missing something, I don't see how your algorithm
> is going to work as each 512 byte chunk of the block will *always*
> have the same sequential byte value. That is, unless you have some
> way of preventing wraparound at 255 without adding additional block
> overhead.

I think the way you would work is to have the smgr note the sequential value
it found when it read in a page and then when it writes it out increment that
value by one. Conveniently the pages would be 16 bytes shorter than an 8kb
page so you have 16 bytes available with every buffer to note information like
the last sequential tag the buffer used.

> Lastly, from a performance perspective, it's going to be faster to
> compute the entire page's checksum than it would be to check the
> sequence every 512 bytes and perform the space adjustment.

That seems pretty unlikely. CRC checks are expensive cpu-wise, we're already
suffering a copy due to our use of read/write the difference between
read/write of 8192 bytes and readv/writev of 511b*16+1*6 is going to be
non-zero but very small. Thousands of times quicker than the CRC.

If we went to direct-io then it would entail an additional memory-copy which
would be annoying. But that would still be much much cheaper than a CRC check.
The best we could do in that case would be to do a CRC check at the same time
as the memory move.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-02-28 11:22:09 Re: Packed short varlenas, what next?
Previous Message Simon Riggs 2007-02-28 10:59:00 VACUUM and spoiling the buffer manager cache