Re: Locking B-tree leafs immediately in exclusive mode

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Locking B-tree leafs immediately in exclusive mode
Date: 2018-06-11 08:51:46
Message-ID: CAOGQiiNtLyFvqZeTwnV4yWyvk2oRP_j0nCcddJb6k3GHkBgvsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 5, 2018 at 7:15 PM, Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> Hi!
>
> Currently _bt_search() always locks leaf buffer in shared mode (aka BT_READ),
> while caller can relock it later. However, I don't see what prevents
> _bt_search()
> from locking leaf immediately in exclusive mode (aka BT_WRITE) when required.
> When traversing downlink from non-leaf page of level 1 (lowest level of non-leaf
> pages just prior to leaf pages), we know that next page is going to be leaf. In
> this case, we can immediately lock next page in exclusive mode if required.
> For sure, it might happen that we didn't manage to exclusively lock leaf in this
> way when _bt_getroot() points us to leaf page. But this case could be handled
> in _bt_search() by relock. Please, find implementation of this approach in the
> attached patch.
>
> I've run following simple test of this patch on 72-cores machine.
>
> # postgresql.conf
> max_connections = 300
> shared_buffers = 32GB
> fsync = off
> synchronous_commit = off
>
> # DDL:
> CREATE UNLOGGED TABLE ordered (id serial primary key, value text not null);
> CREATE UNLOGGED TABLE unordered (i integer not null, value text not null);
>
> # script_ordered.sql
> INSERT INTO ordered (value) VALUES ('abcdefghijklmnoprsqtuvwxyz');
>
> # script_unordered.sql
> \set i random(1, 1000000)
> INSERT INTO unordered VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');
>
> # commands
> pgbench -T 60 -P 1 -M prepared -f script_ordered.sql -c 150 -j 150 postgres
> pgbench -T 60 -P 1 -M prepared -f script_unordered.sql -c 150 -j 150 postgres
>
> # results
> ordered, master: 157473 TPS
> ordered, patched 231374 TPS
> unordered, master: 232372 TPS
> unordered, patched: 232535 TPS
>
> As you can see, difference in unordered case is negligible Due to random
> inserts, concurrency for particular leafs is low. But ordered
> insertion is almost
> 50% faster on patched version.
>
> I wonder how could we miss such a simple optimization till now, but I also don't
> see this patch to brake anything.
>
> In patched version, it might appear that we have to traverse
> rightlinks in exclusive
> mode due to splits concurrent to downlink traversal. However, the same might
> happen in current master due to splits concurrent to relocks. So, I
> don't expect
> performance regression to be caused by this patch.
>

I had a look at this patch and it looks good to me.
With the improvement in performance you mentioned, this optimisation
looks worthy.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-06-11 09:15:39 Re: hot_standby_feedback vs excludeVacuum and snapshots
Previous Message David Rowley 2018-06-11 07:49:04 Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian