Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Sergey Koposov <koposov(at)ast(dot)cam(dot)ac(dot)uk>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Date: 2012-06-02 01:55:23
Message-ID: CA+CSw_u77CC2f-EG1UUZOBprQedPQ_HE8K=fmcNutBr_mdXHOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 2, 2012 at 1:48 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> Buffer pins aren't a cache: with a cache you are trying to mask a slow
> operation (like a disk i/o) with a faster such that the amount of slow
> operations are minimized.  Buffer pins however are very different in
> that we only care about contention on the reference count (the buffer
> itself is not locked!) which makes me suspicious that caching type
> algorithms are the wrong place to be looking.  I think it comes to do
> picking between your relatively complex but general, lock displacement
> approach or a specific strategy based on known bottlenecks.

I agree that pins aren't like a cache. I mentioned the caching
algorithms because they work based on access frequency and highly
contended locks are likely to be accessed frequently even from a
single backend. However this only makes sense for the delayed
unpinning method, and I also have come to the conclusion that it's not
likely to work well. Besides delaying cleanup, the overhead for the
common case of uncontended access is just too much.

It seems to me that even the nailing approach will need a replacement
algorithm. The local pins still need to be published globally and
because shared memory size is fixed, the maximum amount of locally
pinned nailed buffers needs to be limited as well.

But anyway, I managed to completely misread the profile that Sergey
gave. Somehow I missed that the time went into the retry TAS in slock
instead of the inlined TAS. This shows that the issue isn't just
cacheline ping-pong but cacheline stealing. This could be somewhat
mitigated by making pinning lock-free. The Nb-GCLOCK paper that Robert
posted earlier in another thread describes an approach for this. I
have a WIP patch (attached) that makes the clock sweep lock-free in
the common case. This patch gave a 40% performance increase for an
extremely allocation heavy load running with 64 clients on a 4 core 1
socket system, lesser gains across the board. Pinning has a shorter
lock duration (and a different lock type) so the gain might be less,
or it might be a larger problem and post a higher gain. Either way, I
think the nailing approach should be explored further, cacheline
ping-pong could still be a problem with higher number of processors
and losing the spinlock also loses the ability to detect contention.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Attachment Content-Type Size
lockfree-getbuffer.patch application/octet-stream 7.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2012-06-02 04:58:36 Re: [RFC] Interface of Row Level Security
Previous Message Merlin Moncure 2012-06-01 22:48:47 Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile