Locking B-tree leafs immediately in exclusive mode

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Locking B-tree leafs immediately in exclusive mode
Date: 2018-06-05 13:45:19
Message-ID: CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt+bSeaJworwFoig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi!

Currently _bt_search() always locks leaf buffer in shared mode (aka BT_READ),
while caller can relock it later. However, I don't see what prevents
_bt_search()
from locking leaf immediately in exclusive mode (aka BT_WRITE) when required.
When traversing downlink from non-leaf page of level 1 (lowest level of non-leaf
pages just prior to leaf pages), we know that next page is going to be leaf. In
this case, we can immediately lock next page in exclusive mode if required.
For sure, it might happen that we didn't manage to exclusively lock leaf in this
way when _bt_getroot() points us to leaf page. But this case could be handled
in _bt_search() by relock. Please, find implementation of this approach in the
attached patch.

I've run following simple test of this patch on 72-cores machine.

# postgresql.conf
max_connections = 300
shared_buffers = 32GB
fsync = off
synchronous_commit = off

# DDL:
CREATE UNLOGGED TABLE ordered (id serial primary key, value text not null);
CREATE UNLOGGED TABLE unordered (i integer not null, value text not null);

# script_ordered.sql
INSERT INTO ordered (value) VALUES ('abcdefghijklmnoprsqtuvwxyz');

# script_unordered.sql
\set i random(1, 1000000)
INSERT INTO unordered VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');

# commands
pgbench -T 60 -P 1 -M prepared -f script_ordered.sql -c 150 -j 150 postgres
pgbench -T 60 -P 1 -M prepared -f script_unordered.sql -c 150 -j 150 postgres

# results
ordered, master: 157473 TPS
ordered, patched 231374 TPS
unordered, master: 232372 TPS
unordered, patched: 232535 TPS

As you can see, difference in unordered case is negligible Due to random
inserts, concurrency for particular leafs is low. But ordered
insertion is almost
50% faster on patched version.

I wonder how could we miss such a simple optimization till now, but I also don't
see this patch to brake anything.

In patched version, it might appear that we have to traverse
rightlinks in exclusive
mode due to splits concurrent to downlink traversal. However, the same might
happen in current master due to splits concurrent to relocks. So, I
don't expect
performance regression to be caused by this patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
btree-search-write-lock-1.patch application/octet-stream 4.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-06-05 13:57:15 Re: Spilling hashed SetOps and aggregates to disk
Previous Message serge 2018-06-05 13:41:45 RE: Re: Spilling hashed SetOps and aggregates to disk