Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-12-05 00:54:58
Message-ID: CAH2-WzkXHhjhmUYfVvu6afbojU97MST8RUT1U=hLd2W-GC5FNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 3, 2019 at 12:13 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The new criteria/heuristic for unique indexes is very simple: If a
> unique index has an existing item that is a duplicate on the incoming
> item at the point that we might have to split the page, then apply
> deduplication. Otherwise (when the incoming item has no duplicates),
> don't apply deduplication at all -- just accept that we'll have to
> split the page.

> the working/draft version of the patch will often avoid a huge amount of
> bloat in a pgbench-style workload that has an extra index on the
> pgbench_accounts table, to prevent HOT updates. The accounts primary
> key (pgbench_accounts_pkey) hardly grows at all with the patch, but
> grows 2x on master.

I have numbers from my benchmark against my working copy of the patch,
with this enhanced design for unique index deduplication.

With an extra index on pgbench_accounts's abalance column (that is
configured to not use deduplication for the test), and with the aid
variable (i.e. UPDATEs on pgbench_accounts) configured to use skew, I
have a variant of the standard pgbench TPC-B like benchmark. The
pgbench script I used was as follows:

\set r random_gaussian(1, 100000 * :scale, 4.0)
\set aid abs(hash(:r)) % (100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;

Results from interlaced 2 hour runs at pgbench scale 5,000 are as
follows (shown in reverse chronological order):

master_2_run_16.out: "tps = 7263.948703 (including connections establishing)"
patch_2_run_16.out: "tps = 7505.358148 (including connections establishing)"
master_1_run_32.out: "tps = 9998.868764 (including connections establishing)"
patch_1_run_32.out: "tps = 9781.798606 (including connections establishing)"
master_1_run_16.out: "tps = 8812.269270 (including connections establishing)"
patch_1_run_16.out: "tps = 9455.476883 (including connections establishing)"

The patch comes out ahead in the first 2 hour run, with later runs
looking like a more even match. I think that each run didn't last long
enough to even out the effects of autovacuum, but this is really about
index size rather than overall throughput, so it's not that important.
(I need to get a large server to do further performance validation
work, rather than just running overnight benchmarks on my main work
machine like this.)

The primary key index (pgbench_accounts_pkey) starts out at 10.45 GiB
in size, and ends at 12.695 GiB in size with the patch. Whereas with
master, it also starts out at 10.45 GiB, but finishes off at 19.392
GiB.

Clearly this is a significant difference -- the index is only ~65% of
its master-branch size with the patch. See attached tar archive with
logs, and pg_buffercache output after each run. (The extra index on
pgbench_accounts.abalance is pretty much the same size for
patch/master, since deduplication was disabled for the patch runs.)
And, as I said, I believe that we can make this unique index
deduplication stuff an internal thing that isn't even documented
(maybe a passing reference is appropriate when talking about general
deduplication).

--
Peter Geoghegan

Attachment Content-Type Size
overnight-benchmark.tar.gz application/x-gzip 507.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2019-12-05 01:19:51 Re: Implementing Incremental View Maintenance
Previous Message Smith, Peter 2019-12-05 00:51:18 RE: Proposal: Add more compile-time asserts to expose inconsistencies.