Re: WALInsertLock tuning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WALInsertLock tuning
Date: 2011-06-06 22:25:08
Message-ID: BANLkTimyiSMiVs90qSzJbUoFh--wFiFFEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 6, 2011 at 6:05 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Mon, Jun 6, 2011 at 10:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On Mon, Jun 6, 2011 at 11:27 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>
>>> But that even assumes we write the unzeroed data at the end of the
>>> buffer. We don't. We only write data up to the end of the WAL record
>>> on the current page, unless we do a continuation record,
>>
>> I see no codepath in XLogWrite which writes anything other than full
>> buffer pages.
>
> Second line, boolean variable called "ispartialpage".

That affects our tracking of the write and flush positions, but not
what we actually write to the file:

/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
errno = 0;
if (write(openLogFile, from, nbytes) != nbytes)

As you can see, nbytes is always a multiple of XLOG_BLCKSZ.

> As I mentioned, even if spare bytes at the end of page were written,
> they aren't ever read except in corner case bugs that would be trapped
> by other logic put there to protect us.
>
> We're safe. If I didn't believe it and hadn't tested it, I wouldn't speak.

I think you need to do some analysis of how much this actually
improves performance. I don't like the idea of applying performance
patches without performance testing; sometimes the results are
counterintuitive. And even when they're not, it's nice to know how
much you've gained.

As to the question of whether it's safe, I think I'd agree that the
chances of this backfiring are pretty remote. I think that with the
zeroing they are exactly zero, because (now that we start XLOG
positions at 0/1) there is now way that xl_prev = {0, 0} can ever be
valid. Without the zeroing, well, it's remotely possible that xl_prev
could happen to appear valid and that xl_crc could happen to match...
but the chances are presumably quite remote. Just the chances of the
CRC matching should be around 2^-32. The chances of an xl_prev match
are harder to predict, because the matching values for CRCs should be
pretty much uniformly distributed, while xl_prev isn't random. But
even given that the chance of a match is should be very small, so in
practice there is likely no harm. It strikes me, though, that we
could probably get nearly all of the benefit of this patch by being
willing to zero the first sizeof(XLogRecord) bytes following a record,
but not the rest of the buffer. That would pretty much wipe out any
chance of an xl_prev match, I think, and would likely still get nearly
all of the performance benefit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brar Piening 2011-06-06 22:27:18 Re: Visual Studio 2010/Windows SDK 7.1 support
Previous Message Simon Riggs 2011-06-06 22:15:12 Re: reducing the overhead of frequent table locks - now, with WIP patch