Re: HACKERS[PROPOSAL] split ProcArrayLock into multiple parts

From: Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>
To: Jim Van Fleet <vanfleet(at)us(dot)ibm(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: HACKERS[PROPOSAL] split ProcArrayLock into multiple parts
Date: 2017-06-07 22:38:30
Message-ID: 1677284f35c40af909c317c072339492@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Good day Robert, Jim, and everyone.

On 2017-06-08 00:06, Jim Van Fleet wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote on 06/07/2017 12:12:02 PM:
>
>> > OK -- would love the feedback and any suggestions on how to
> mitigate the low
>> > end problems.
>>
>> Did you intend to attach a patch?
> Yes I do -- tomorrow or Thursday -- needs a little cleaning up ...
>
>> > Sokolov Yura has a patch which, to me, looks good for pgbench rw
>> > performance. Does not do so well with hammerdb (about the same as
> base) on
>> > single socket and two socket.
>>
>> Any idea why? I think we will have to understand *why* certain
> things
>> help in some situations and not others, not just *that* they do, in
>> order to come up with a good solution to this problem.

My patch improves acquiring contended/blocking LWLock on NUMA cause:
a. patched procedure generates much lesser writes, especially because
taking WaitListLock is unified with acquiring the lock itself.
Access to modified memory is very expensive on NUMA, so less writes
leads to less wasted time.
b. it spins several time on lock->state in attempts to acquire lock
before starting attempts to queue self to wait list. It is really the
cause of some speedup. Without spinning patch just removes
degradation on contention.
I don't know why spinning doesn't improves single socket performance
though :-) Probably still because all algorithmic overhead (system
calls, sleeping and awakening process) is not too expensive until
NUMA is involved.

> Looking at the data now -- LWLockAquire philosophy is different -- at
> first glance I would have guessed "about the same" as the base, but I
> can not yet explain why we have super pgbench rw performance and "the
> same" hammerdb performance

My patch improves only blocking contention, ie when a lot of EXCLUSIVE
locks are involved. pgbench rw generates a lot of write traffic, so
there is a lot of contention and waiting on WALInsertLocks (in
XLogInsertRecord, and waiting in XLogFlush), WalWriteLock (in
XLogFlush), CLogControlLock (in TransactionIdSetTreeStatus).

The case when SHARED lock is much more common than EXCLUSIVE is not
affected by patch, because SHARED is acquired then on the fast path
in both original and patched version.

So, looks like hammerdb doesn't produce much EXCLUSIVE contention on
LWLocks, so it is not improved with the patch.

Splitting ProcArrayLock helps with acquiring SHARED lock on NUMA in
absence of EXCLUSIVE lock because of the same reason why my patch
improves acquiring of blocking lock: less writes to same memory.
Since every process writes to some one part of ProcArrayLock, there
is a lot less writes to each part of ProcArrayLock, so acquiring
SHARED lock pays lesser for accessing to modified memory on NUMA.

Probably I'm mistaken somewhere.

>
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-06-07 23:05:58 Re: PG10 transition tables, wCTEs and multiple operations on the same table
Previous Message Peter Geoghegan 2017-06-07 22:25:30 Re: PG10 transition tables, wCTEs and multiple operations on the same table