Re: CLOG contention, part 2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CLOG contention, part 2
Date: 2012-02-08 23:26:52
Message-ID: CA+Tgmoa38nvEzVPbJJ4RFWQMyAWtzj_Oj6xBcAk8aQ4BguOJew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 29, 2012 at 6:04 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Sun, Jan 29, 2012 at 9:41 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> If I cast to a int, then I see advancement:
>
> I'll initialise it as 0, rather than -1 and then we don't have a
> problem in any circumstance.
>
>
>>> I've specifically designed the pgbench changes required to simulate
>>> conditions of clog contention to help in the evaluation of this patch.
>>
>> Yep, I've used that one for the testing.
>
> Most of the current patch is just bookkeeping to keep track of the
> point when we can look at history in read only manner.
>
> I've isolated the code better to allow you to explore various
> implementation options. I don't see any performance difference between
> any of them really, but you're welcome to look.
>
> Please everybody note that the clog history doesn't even become active
> until the first checkpoint, so this is dead code until we've hit the
> first checkpoint cycle and completed a million transactions since
> startup. So its designed to tune for real world situations, and is not
> easy to benchmark. (Maybe we could start earlier, but having extra
> code just for first few minutes seems waste of energy, especially
> since we must hit million xids also).

I find that this version does not compile:

clog.c: In function ‘TransactionIdGetStatus’:
clog.c:431: error: ‘clog’ undeclared (first use in this function)
clog.c:431: error: (Each undeclared identifier is reported only once
clog.c:431: error: for each function it appears in.)

Given that, I obviously cannot test this at this point, but let me go
ahead and theorize about how well it's likely to work. What Tom
suggested before (and after some reflection I think I believe it) is
that the frequency of access will be highest for the newest CLOG page
and then drop off for each page further back you go. Clearly, if that
drop-off is fast - e.g. each buffer further backward is half as likely
to be accessed as the next newer one - then the fraction of accesses
that will hit pages that are far enough back to benefit from this
optimization will be infinitesmal; 1023 out of every 1024 accesses
will hit the first ten pages, and on a high-velocity system those all
figure to have been populated since the last checkpoint. The best
case for this patch should be an access pattern that involves a very
long tail; actually, pgbench is a pretty good fit for that, assuming
the scale factor is large enough. For example, at scale factor 100,
we've got 10,000,000 tuples: choosing one at random, we're almost
exactly 90% likely to find one that hasn't been chosen in the last
1,024,576 tuples (i.e. 32 CLOG pages @ 32K txns/page). In terms of
reducing contention on the main CLOG SLRU, that sounds pretty
promising, but depends somewhat on the rate at which transactions are
processed relative to the frequency of checkpoints, since that will
affect how many pages back you have go to use the history path.

However, there is a potential fly in the ointment: in other cases in
which we've reduced contention at the LWLock layer, we've ended up
with very nasty contention at the spinlock layer that can sometimes
eat more CPU time than the LWLock contention did. In that light, it
strikes me that it would be nice to be able to partition the
contention N ways rather than just 2 ways. I think we could do that
as follows. Instead of having one control lock per SLRU, have N
locks, where N is probably a power of 2. Divide the buffer pool for
the SLRU N ways, and decree that each slice of the buffer pool is
controlled by one of the N locks. Route all requests for a page P to
slice P mod N. Unlike this approach, that wouldn't completely
eliminate contention at the LWLock level, but it would reduce it
proportional to the number of partitions, and it would reduce spinlock
contention according to the number of partitions as well. A down side
is that you'll need more buffers to get the same hit rate, but this
proposal has the same problem: it doubles the amount of memory
allocated for CLOG. Of course, this approach is all vaporware right
now, so it's anybody's guess whether it would be better than this if
we had code for it. I'm just throwing it out there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-02-08 23:33:41 Re: Progress on fast path sorting, btree index creation time
Previous Message Magnus Hagander 2012-02-08 22:21:09 Re: pgstat documentation tables