Re: Patch: fix lock contention for HASHHDR.mutex

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: fix lock contention for HASHHDR.mutex
Date: 2015-12-28 09:01:12
Message-ID: 20151228120112.6a847dfd@fujitsu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is another preliminary result I would like to share. As always you
will find corresponding patch in attachment. It has work in progress
quality.

The idea was suggested by colleague of mine Aleksander Lebedev.

freeList is partitioned like in "no lock" patch. When there is no
enough free items in a freeList we borrow AN items from a global list.
When freeList become too large we return AN items back to global list.
This global list is also partitioned into PN partitions. Each partition
is protected by a spinlock.

This way we have less lock contention than in "lwlock" or "spinlock
array" versions since we borrow multiple free elements, not one a time.
Also in worst case only AN*(NUM_LOCK_PARTITIONS-1) free items are not
used instead of (Total/NUM_LOCK_PARTITIONS)*(NUM_LOCK_PARTITIONS-1).

On 60-core server amount of TPS depends on AN and PN like this:

| | | | | |
| AN = 1 | AN = 2 | AN = 4 | AN = 8 | AN =16 | AN =32
-------|--------|--------|--------|--------|--------|--------
| 733.0 | 1120.6 | 1605.5 | 1842.5 | 1545.5 | 1237.0
PN = 1 | 740.3 | 1127.0 | 1634.2 | 1800.8 | 1573.5 | 1245.1
| 742.9 | 1102.1 | 1647.2 | 1853.6 | 1533.4 | 1251.9
-------|--------|--------|--------|--------|--------|--------
| 1052.0 | 1438.1 | 1755.6 | 1981.0 | 2022.0 | 1816.8
PN = 2 | 1044.8 | 1453.1 | 1784.0 | 1958.3 | 2033.2 | 1819.2
| 1028.7 | 1419.8 | 1809.2 | 1981.2 | 2028.2 | 1790.2
-------|--------|--------|--------|--------|--------|--------
| 1182.0 | 1521.5 | 1813.2 | 1932.6 | 2035.2 | 1948.4
PN = 4 | 1212.4 | 1535.4 | 1816.8 | 1927.0 | 2018.7 | 2014.6
| 1189.4 | 1528.9 | 1816.9 | 1942.6 | 2011.9 | 2018.3
-------|--------|--------|--------|--------|--------|--------
| 1148.1 | 1522.2 | 1795.4 | 1926.6 | 2031.7 | 2015.6
PN = 8 | 1175.6 | 1529.4 | 1807.6 | 1913.5 | 2007.3 | 2062.0
| 1169.9 | 1528.0 | 1796.3 | 1926.0 | 2011.1 | 2042.8
-------|--------|--------|--------|--------|--------|--------
| 1117.7 | 1491.0 | 1803.9 | 1925.3 | 2029.4 | 2056.2
PN =16 | 1132.8 | 1481.0 | 1809.6 | 1968.1 | 2033.8 | 2068.5
| 1131.4 | 1481.8 | 1819.4 | 1946.2 | 2071.1 | 2073.8

AN = GLOBAL_FREE_LIST_ALLOC_NUMBER
PN = GLOBAL_FREE_LIST_PARTITIONS_NUM

There is no performance degradation on Core i7. By increasing PN or AN
any further we don't gain any more TPS.

As you may see this version is about 30% faster than "lwlock" or
"spinlock array" and 3.1 times faster than master. Still it's about 2.5
slower than "no locks" version which I find frustrating.

Next I will try to speedup this version by modifying global_free_list_*
procedures. Current implementations are not most efficient ones. Also
I'm planning to explore approaches which involve lock free algorithms.

I would like to know your opinion about such approach. For instance
could we spare AN*(NUM_LOCK_PARTITIONS-1) items in a worst case or we
can't by same reason we can't do it for (Total / NUM_LOCK_PARTITIONS) *
(NUM_LOCK_PARTITIONS-1)?

Attachment Content-Type Size
sharded-global-free-list.patch text/x-patch 15.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2015-12-28 09:38:59 Re: Freeze avoidance of very large table.
Previous Message David Rowley 2015-12-28 08:35:41 Patch to fix misspellings of the word "segment"