Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-19 01:51:43
Message-ID: CAH2-Wz=WY-ghVTdjtp7RxQgrz3bTkEUxveOv-ZM8+RcY9m4wSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 18, 2019 at 5:12 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Smarter choices on page splits pay off with higher client counts
> because they reduce contention at likely hot points. It's kind of
> crazy that the code in _bt_check_unique() sometimes has to move right,
> while holding an exclusive buffer lock on the original page and a
> shared buffer lock on its sibling page at the same time. It then has
> to hold a third buffer lock concurrently, this time on any heap pages
> it is interested in.

Actually, by the time we get to 16 clients, this workload does make
the indexes and tables smaller. Here is pg_buffercache output
collected after the first 16 client case:

Master
======

relname │ relforknumber │
size_main_rel_fork_blocks │ buffer_count │ avg_buffer_usg
─────────────────────────────────────────┼───────────────┼───────────────────────────┼──────────────┼────────────────────────
pgbench_history │ 0 │
123,484 │ 123,484 │ 4.9989715266755207
pgbench_accounts │ 0 │
34,665 │ 10,682 │ 4.4948511514697622
pgbench_accounts_pkey │ 0 │
5,708 │ 1,561 │ 4.8731582319026265
pgbench_tellers │ 0 │
489 │ 489 │ 5.0000000000000000
pgbench_branches │ 0 │
284 │ 284 │ 5.0000000000000000
pgbench_tellers_pkey │ 0 │
56 │ 56 │ 5.0000000000000000
....

Patch
=====

relname │ relforknumber │
size_main_rel_fork_blocks │ buffer_count │ avg_buffer_usg
─────────────────────────────────────────┼───────────────┼───────────────────────────┼──────────────┼────────────────────────
pgbench_history │ 0 │
127,864 │ 127,864 │ 4.9980447975974473
pgbench_accounts │ 0 │
33,933 │ 9,614 │ 4.3517786561264822
pgbench_accounts_pkey │ 0 │
5,487 │ 1,322 │ 4.8857791225416036
pgbench_tellers │ 0 │
204 │ 204 │ 4.9803921568627451
pgbench_branches │ 0 │
198 │ 198 │ 4.3535353535353535
pgbench_tellers_pkey │ 0 │
14 │ 14 │ 5.0000000000000000
....

The main fork for pgbench_history is larger with the patch, obviously,
but that's good. pgbench_accounts_pkey is about 4% smaller, which is
probably the most interesting observation that can be made here, but
the tables are also smaller. pgbench_accounts itself is ~2% smaller.
pgbench_branches is ~30% smaller, and pgbench_tellers is 60% smaller.
Of course, the smaller tables were already very small, so maybe that
isn't important. I think that this is due to more effective pruning,
possibly because we get better lock arbitration as a consequence of
better splits, and also because duplicates are in heap TID order. I
haven't observed this effect with larger databases, which have been my
focus.

It isn't weird that shared_buffers doesn't have all the
pgbench_accounts blocks, since, of course, this is highly skewed by
design -- most blocks were never accessed from the table.

This effect seems to be robust, at least with this workload. The
second round of benchmarks (which have their own pgbench -i
initialization) show very similar amounts of bloat at the same point.
It may not be that significant, but it's also not a fluke.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2019-03-19 02:02:57 Re: [HACKERS] CLUSTER command progress monitor
Previous Message Tatsuro Yamada 2019-03-19 01:43:23 Re: [HACKERS] CLUSTER command progress monitor