Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Sergey Koposov <koposov(at)ast(dot)cam(dot)ac(dot)uk>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Date: 2012-05-31 19:25:28
Message-ID: CAHyXU0wH-2L=DHOXQmiDHBGRXxODCnQCHXB9rxcU1ju-mqksFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 31, 2012 at 1:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov <koposov(at)ast(dot)cam(dot)ac(dot)uk> wrote:
>>> On Thu, 31 May 2012, Robert Haas wrote:
>>>
>>>> Oh, ho.  So from this we can see that the problem is that we're
>>>> getting huge amounts of spinlock contention when pinning and unpinning
>>>> index pages.
>>>>
>>>> It would be nice to have a self-contained reproducible test case for
>>>> this, so that we could experiment with it on other systems.
>>>
>>>
>>> I have created it a few days ago:
>>> http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php
>>>
>>> It is still valid. And I'm using exactly it to test. The only thing to
>>> change is to create a two-col index and drop another index.
>>> The scripts are precisely the ones I'm using now.
>>>
>>> The problem is that in order to see a really big slowdown (10 times slower
>>> than a single thread) I've had to raise the buffers to 48g but it was slow
>>> for smaller shared buffer settings as well.
>>>
>>> But I'm not sure how sensitive the test is to the hardware.
>>
>> It's not: high contention on spinlocks is going to suck no matter what
>> hardware you have.   I think the problem is pretty obvious now: any
>> case where multiple backends are scanning the same sequence of buffers
>> in a very tight loop is going to display this behavior.  It doesn't
>> come up that often: it takes a pretty unusual sequence of events to
>> get a bunch of backends hitting the same buffer like that.
>>
>> Hm, I wonder if you could alleviate the symptoms by making making the
>> Pin/UnpinBuffer smarter so that frequently pinned buffers could stay
>> pinned longer -- kinda as if your private ref count was hacked to be
>> higher in that case.   It would be a complex fix for a narrow issue
>> though.
>
> This test case is unusual because it hits a whole series of buffers
> very hard.  However, there are other cases where this happens on a
> single buffer that is just very, very hot, like the root block of a
> btree index, where the pin/unpin overhead hurts us.  I've been
> thinking about this problem for a while, but it hasn't made it up to
> the top of my priority list, because workloads where pin/unpin is the
> dominant cost are still relatively uncommon.  I expect them to get
> more common as we fix other problems.
>
> Anyhow, I do have some vague thoughts on how to fix this.  Buffer pins
> are a lot like weak relation locks, in that they are a type of lock
> that is taken frequently, but rarely conflicts.  And the fast-path
> locking in 9.2 provides a demonstration of how to handle this kind of
> problem efficiently: making the weak, rarely-conflicting locks
> cheaper, at the cost of some additional expense when a conflicting
> lock (in this case, a buffer cleanup lock) is taken.  In particular,
> each backend has its own area to record weak relation locks, and a
> strong relation lock must scan all of those areas and migrate any
> locks found there to the main lock table.  I don't think it would be
> feasible to adopt exactly this solution for buffer pins, because page
> eviction and buffer cleanup locks, while not exactly common, are
> common enough that we can't require a scan of N per-backend areas
> every time one of those operations occurs.
>
> But, maybe we could have a system of this type that only applies to
> the very hottest buffers.  Suppose we introduce two new buffer flags,
> BUF_NAILED and BUF_NAIL_REMOVAL.  When we detect excessive contention
> on the buffer header spinlock, we set BUF_NAILED.  Once we do that,
> the buffer can't be evicted until that flag is removed, and backends
> are permitted to record pins in a per-backend area protected by a
> per-backend spinlock or lwlock, rather than in the buffer header.
> When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL.

Hm, couple questions: how do you determine if/when to un-nail a
buffer, and who makes that decision (bgwriter?) Is there a limit to
how many buffers you are allowed to nail? It seems like a much
stronger idea, but one downside I see vs the 'pin for longer idea' i
was kicking around was how to deal stale nailed buffers and keeping
them from uncontrollably growing so that you have to either stop
nailing or forcibly evicting them.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-05-31 19:49:11 Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Previous Message Jeff Davis 2012-05-31 19:17:50 Re: extending relations more efficiently