Re: reducing the overhead of frequent table locks - now, with WIP patch

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing the overhead of frequent table locks - now, with WIP patch
Date: 2011-06-05 21:46:32
Message-ID: BANLkTimFkPJB_mL=b2noGXg55f1v7FObDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 5, 2011 at 4:01 PM, Stefan Kaltenbrunner
<stefan(at)kaltenbrunner(dot)cc> wrote:
> On 06/05/2011 09:12 PM, Heikki Linnakangas wrote:
>> On 05.06.2011 22:04, Stefan Kaltenbrunner wrote:
>>> and one for the -j80 case(also patched).
>>>
>>>
>>> 485798   48.9667  postgres                 s_lock
>>> 60327     6.0808  postgres                 LWLockAcquire
>>> 57049     5.7503  postgres                 LWLockRelease
>>> 18357     1.8503  postgres                 hash_search_with_hash_value
>>> 17033     1.7169  postgres                 GetSnapshotData
>>> 14763     1.4881  postgres                 base_yyparse
>>> 14460     1.4575  postgres                 SearchCatCache
>>> 13975     1.4086  postgres                 AllocSetAlloc
>>> 6416      0.6467  postgres                 PinBuffer
>>> 5024      0.5064  postgres                 SIGetDataEntries
>>> 4704      0.4741  postgres                 core_yylex
>>> 4625      0.4662  postgres                 _bt_compare
>>
>> Hmm, does that mean that it's spending 50% of the time spinning on a
>> spinlock? That's bad. It's one thing to be contended on a lock, and have
>> a lot of idle time because of that, but it's even worse to spend a lot
>> of time spinning because that CPU time won't be spent on doing more
>> useful work, even if there is some other process on the system that
>> could make use of that CPU time.
>
> well yeah - we are broken right now with only being able to use ~20% of
> CPU on a modern mid-range box, but using 80% CPU (or 4x like in the
> above case) and only getting less than 2x the performance seems wrong as
> well. I also wonder if we are still missing something fundamental -
> because even with the current patch we are quite far away from linear
> scaling and light-years from some of our competitors...

Could you compile with LWLOCK_STATS, rerun these tests, total up the
"blk" numbers by LWLockId, and post the results? (Actually, totalling
up the shacq and exacq numbers would be useful as well, if you
wouldn't mind.)

Unless I very much miss my guess, we're going to see zero contention
on the new structures introduced by this patch. Rather, I suspect
what we're going to find is that, with the hideous contention on one
particular lock manager partition lock removed, there's a more
spread-out contention problem, likely involving the lock manager
partition lock, the buffer mapping locks, and possibly other LWLocks
as well. The fact that the system is busy-waiting rather than just
not using the CPU at all probably means that the remaining contention
is more spread out than that which is removed by this patch. We don't
actually have everything pile up on a single LWLock (as happens in git
master), but we do spend a lot of time fighting cache lines away from
other CPUs. Or at any rate, that's my guess: we need some real
numbers to know for sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2011-06-06 00:16:00 Re: Review: psql include file using relative path
Previous Message Josh Kupershmidt 2011-06-05 20:36:57 Re: patch: Allow \dd to show constraint comments