Re: Block-level CRC checks

From: decibel <decibel(at)decibel(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block-level CRC checks
Date: 2009-12-01 23:45:45
Message-ID: E8DA5FEA-F230-453E-817E-40F51FE86EAA@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Dec 1, 2009, at 4:13 PM, Greg Stark wrote:
> On Tue, Dec 1, 2009 at 9:57 PM, Richard Huxton <dev(at)archonet(dot)com>
> wrote:
>> Why are we writing out the hint bits to disk anyway? Is it really so
>> slow to calculate them on read + cache them that it's worth all this
>> trouble? Are they not also to blame for the "write my import data
>> twice"
>> feature?
>
> It would be interesting to experiment with different strategies. But
> the results would depend a lot on workloads and I doubt one strategy
> is best for everyone.

I agree that we'll always have the issue with freezing. But I also
think it's time to revisit the whole idea of hint bits. AFAIK we only
keep at maximum 2B transactions, and each one takes 2 bits in CLOG.
So worst-case scenario, we're looking at 4G of clog. On modern
hardware, that's not a lot. And that's also assuming that we don't do
any kind of compression on that data (obviously we couldn't use just
any old compression algorithm, but there's certainly tricks that
could be used to reduce the size of this information).

I know this is something that folks at EnterpriseDB have looked at,
perhaps there's data they can share.

> It has often been suggested that we could set the hint bits but not
> dirty the page, so they would never be written out unless some other
> update hit the page. In most use cases that would probably result in
> the right thing happening where we avoid half the writes but still
> stop doing transaction status lookups relatively promptly. The scary
> thing is that there might be use cases such as static data loaded
> where the hint bits never get set and every scan of the page has to
> recheck those statuses until the tuples are frozen.
>
> (Not dirtying the page almost gets us out of the CRC problems -- it
> doesn't in our current setup because we don't take a lock when setting
> the hint bits, so you could set it on a page someone is in the middle
> of CRC checking and writing. There were other solutions proposed for
> that, including just making hint bits require locking the page or
> double buffering the write.)
>
> There does need to be something like the hint bits which does
> eventually have to be set because we can't keep transaction
> information around forever. Even if you keep the transaction
> information all the way back to the last freeze date (up to about 1GB
> and change I think) then the data has to be written twice, the second
> time is to freeze the transactions. In the worst case then reading a
> page requires a random page access (or two) from anywhere in that 1GB+
> file for each tuple on the page (whether visible to us or not).
> --
> greg
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-12-01 23:55:00 Re: [CORE] EOL for 7.4?
Previous Message Greg Stark 2009-12-01 23:44:02 Re: Block-level CRC checks