Re: Block-level CRC checks

From: Richard Huxton <dev(at)archonet(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 22:46:32
Message-ID: 4B159CC8.9030201@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark wrote:
> On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton <dev(at)archonet(dot)com> wrote:
>> Why are we writing out the hint bits to disk anyway? Is it really so
>> slow to calculate them on read + cache them that it's worth all this
>> trouble? Are they not also to blame for the "write my import data twice"
>> feature?
>
> It would be interesting to experiment with different strategies. But
> the results would depend a lot on workloads and I doubt one strategy
> is best for everyone.
>
> It has often been suggested that we could set the hint bits but not
> dirty the page, so they would never be written out unless some other
> update hit the page. In most use cases that would probably result in
> the right thing happening where we avoid half the writes but still
> stop doing transaction status lookups relatively promptly. The scary
> thing is that there might be use cases such as static data loaded
> where the hint bits never get set and every scan of the page has to
> recheck those statuses until the tuples are frozen.

And how scary is that? Assuming we cache the hints...
1. With the page itself, so same lifespan
2. Separately, perhaps with a different (longer) lifespan.

Separately would then let you trade complexity for compactness - "all of
block B is deleted", "all of table T is visible".

So what is the cost of calculating the hint-bits for a whole block of
tuples in one go vs reading that block from actual spinning disk?

> There does need to be something like the hint bits which does
> eventually have to be set because we can't keep transaction
> information around forever. Even if you keep the transaction
> information all the way back to the last freeze date (up to about 1GB
> and change I think) then the data has to be written twice, the second
> time is to freeze the transactions. In the worst case then reading a
> page requires a random page access (or two) from anywhere in that 1GB+
> file for each tuple on the page (whether visible to us or not).

While on that topic - I'm assuming freezing requires substantially more
effort than updating hint bits?

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-12-01 22:47:56 Re: Block-level CRC checks
Previous Message Greg Smith 2009-12-01 22:40:06 Re: [CORE] EOL for 7.4?