|From:||"Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>|
|To:||'Alexander Korotkov' <a(dot)korotkov(at)postgrespro(dot)ru>|
|Subject:||RE: Locking B-tree leafs immediately in exclusive mode|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Mon, June 11, 2018 at 4:31 PM, Alexander Korotkov wrote:
> On Mon, Jun 11, 2018 at 1:06 PM Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On 5 June 2018 at 14:45, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
> > wrote:
> > > Currently _bt_search() always locks leaf buffer in shared mode (aka
> > > BT_READ), while caller can relock it later. However, I don't see
> > > what prevents
> > > _bt_search()
> > > from locking leaf immediately in exclusive mode (aka BT_WRITE) when
> > > required.
> > > When traversing downlink from non-leaf page of level 1 (lowest level
> > > of non-leaf pages just prior to leaf pages), we know that next page
> > > is going to be leaf. In this case, we can immediately lock next page
> > > in exclusive mode if required.
> > > For sure, it might happen that we didn't manage to exclusively lock
> > > leaf in this way when _bt_getroot() points us to leaf page. But
> > > this case could be handled in _bt_search() by relock. Please, find
> > > implementation of this approach in the attached patch.
> > It's a good idea. How does it perform with many duplicate entries?
> Our B-tree is currently maintaining duplicates unordered. So, during
> insertion we can traverse rightlinks in order to find page, which would
> fit new index tuple.
> However, in this case we're traversing pages in exclusive lock mode, and
> that happens already after re-lock. _bt_search() doesn't do anything
> with that.
> So, I assume there shouldn't be any degradation in the case of many
> duplicate entries.
> But this case definitely worth testing, and I'm planning to do it.
Hi, I'm reviewing this.
Firstly, I did performance tests on 72-cores machines(AWS c5.18xlarge) same as you did.
> # postgresql.conf
> max_connections = 300
> shared_buffers = 32GB
> fsync = off
> synchronous_commit = off
> # DDL:
> CREATE UNLOGGED TABLE ordered (id serial primary key, value text not null);
> CREATE UNLOGGED TABLE unordered (i integer not null, value text not null);
> # script_ordered.sql
> INSERT INTO ordered (value) VALUES ('abcdefghijklmnoprsqtuvwxyz');
> # script_unordered.sql
> \set i random(1, 1000000)
> INSERT INTO unordered VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');
> # commands
> pgbench -T 60 -P 1 -M prepared -f script_ordered.sql -c 150 -j 150 postgres
> pgbench -T 60 -P 1 -M prepared -f script_unordered.sql -c 150 -j 150 postgres
> # results
> ordered, master: 157473 TPS
> ordered, patched 231374 TPS
> unordered, master: 232372 TPS
> unordered, patched: 232535 TPS
# my results
ordered, master: 186045 TPS
ordered, patched: 265842 TPS
unordered, master: 313011 TPS
unordered, patched: 309636 TPS
I confirmed that this patch improves ordered insertion.
In addition to your tests, I did the following tests with many duplicate entries
CREATE UNLOGGED TABLE duplicated (i integer not null, value text not null);
INSERT INTO unordered VALUES (1, 'abcdefghijklmnoprsqtuvwxyz');
pgbench -T 60 -P 1 -M prepared -f script_duplicated.sql -c 150 -j 150 postgres
duplicated, master: 315450 TPS
duplicated, patched: 311584 TPS
It seems there are almostly no performance degression in case of many duplicated entries.
I'm planning to do code review and send it in the next mail.
|Next Message||Michael Paquier||2018-07-09 05:03:09||Incorrect error handling for two-phase state files resulting in data loss|
|Previous Message||Amit Langote||2018-07-09 02:48:32||Re: no partition pruning when partitioning using array type|