Re: Page Checksums + Double Writes

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <alvherre(at)commandprompt(dot)com>,<david(at)fetter(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Page Checksums + Double Writes
Date: 2011-12-23 16:14:06
Message-ID: 4EF4546E0200002500044091@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

>> I would suggest you examine how to have an array of N bgwriters,
>> then just slot the code for hinting into the bgwriter. That way a
>> bgwriter can set hints, calc CRC and write pages in sequence on a
>> particular block. The hinting needs to be synchronised with the
>> writing to give good benefit.
>
> I'll think about that. I see pros and cons, and I'll have to see
> how those balance out after I mull them over.

I think maybe the best solution is to create some common code to use
from both. The problem with *just* doing it in bgwriter is that it
would not help much with workloads like Robert has been using for
most of his performance testing -- a database which fits entirely in
shared buffers and starts thrashing on CLOG. For a background
hinter process my goal would be to deal with xids as they are passed
by the global xmin value, so that you have a cheap way to know that
they are ripe for hinting, and you can frequently hint a bunch of
transactions that are all in the same CLOG page which is recent
enough to likely be already loaded.

Now, a background hinter isn't going to be a net win if it has to
grovel through every tuple on every dirty page every time it sweeps
through the buffers, so the idea depends on having a sufficiently
efficient was to identify interesting buffers. I'm hoping to
improve on this, but my best idea so far is to add a field to the
buffer header for "earliest unhinted xid" for the page. Whenever
this background process wakes up and is scanning through the buffers
(probably just in buffer number order), it does a quick check,
without any pin or lock, to see if the buffer is dirty and the
earliest unhinted xid is below the global xmin. If it passes both
of those tests, there is definitely useful work which can be done if
the page doesn't get evicted before we can do it. We pin the page,
recheck those conditions, and then we look at each tuple and hint
where possible. As we go, we remember the earliest xid that we see
which is *not* being hinted, to store back into the buffer header
when we're done. Of course, we would also update the buffer header
for new tuples or when an xmax is set if the xid involved precedes
what we have in the buffer header.

This would not only help avoid multiple page writes as unhinted
tuples on the page are read, it would minimize thrashing on CLOG and
move some of the hinting work from the critical path of reading a
tuple into a background process.

Thoughts?

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-12-23 16:15:07 Re: xlog location arithmetic
Previous Message Tom Lane 2011-12-23 15:59:35 Re: xlog location arithmetic