Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-11-12 23:18:49
Message-ID: CAH2-Wz=z1kiHW4Z8YjOn84tiMjP5gt=LQH8=ti2NxBF=F5N1RQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 12, 2020 at 3:00 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached is v8, which has the enhancements for low cardinality data
> that I mentioned earlier today. It also simplifies the logic for
> dealing with posting lists that we need to delete some TIDs from.
> These posting list simplifications also make the code a bit more
> efficient, which might be noticeable during benchmarking.

One more thing: I repeated a pgbench test that was similar to my
earlier low cardinality tests -- same indexes (fiver, tenner, score,
aid_pkey_include_abalance). And same queries. But longer runs: 4 hours
each. Plus a larger DB: scale 2,500. Plus a rate-limit of 5000 TPS.

Here is the high level report, with 4 runs -- one pair with 16
clients, another pair with 32 clients:

2020-11-11 19:03:26 -0800 - Start of initial data load for run
"patch.r1c16" (DB is also used by later runs)
2020-11-11 19:18:16 -0800 - End of initial data load for run "patch.r1c16"
2020-11-11 19:18:16 -0800 - Start of pgbench run "patch.r1c16"
2020-11-11 23:18:43 -0800 - End of pgbench run "patch.r1c16":
patch.r1c16.bench.out: "tps = 4999.100006 (including connections
establishing)" "latency average = 3.355 ms" "latency stddev = 58.455
ms"
2020-11-11 23:19:12 -0800 - Start of initial data load for run
"master.r1c16" (DB is also used by later runs)
2020-11-11 23:34:33 -0800 - End of initial data load for run "master.r1c16"
2020-11-11 23:34:33 -0800 - Start of pgbench run "master.r1c16"
2020-11-12 03:35:01 -0800 - End of pgbench run "master.r1c16":
master.r1c16.bench.out: "tps = 5000.061623 (including connections
establishing)" "latency average = 8.591 ms" "latency stddev = 64.851
ms"
2020-11-12 03:35:41 -0800 - Start of pgbench run "patch.r1c32"
2020-11-12 07:36:10 -0800 - End of pgbench run "patch.r1c32":
patch.r1c32.bench.out: "tps = 5000.141420 (including connections
establishing)" "latency average = 1.253 ms" "latency stddev = 9.935
ms"
2020-11-12 07:36:40 -0800 - Start of pgbench run "master.r1c32"
2020-11-12 11:37:19 -0800 - End of pgbench run "master.r1c32":
master.r1c32.bench.out: "tps = 5000.542942 (including connections
establishing)" "latency average = 3.069 ms" "latency stddev = 24.640
ms"
2020-11-12 11:38:18 -0800 - Start of pgbench run "patch.r2c16"

We see a very significant latency advantage for the patch here. Here
is the breakdown on query latency from the final patch run,
patch.r1c32:

scaling factor: 2500
query mode: prepared
number of clients: 32
number of threads: 8
duration: 14400 s
number of transactions actually processed: 72002280
latency average = 1.253 ms
latency stddev = 9.935 ms
rate limit schedule lag: avg 0.406 (max 694.645) ms
tps = 5000.141420 (including connections establishing)
tps = 5000.142503 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid1 random_gaussian(1, 100000 * :scale, 4.0)
0.001 \set aid2 random_gaussian(1, 100000 * :scale, 4.5)
0.001 \set aid3 random_gaussian(1, 100000 * :scale, 4.2)
0.001 \set bid random(1, 1 * :scale)
0.001 \set tid random(1, 10 * :scale)
0.001 \set delta random(-5000, 5000)
0.063 BEGIN;
0.361 UPDATE pgbench_accounts SET abalance = abalance +
:delta WHERE aid = :aid1;
0.171 SELECT abalance FROM pgbench_accounts WHERE aid = :aid2;
0.172 SELECT abalance FROM pgbench_accounts WHERE aid = :aid3;
0.074 END;

Here is the equivalent for master:

scaling factor: 2500
query mode: prepared
number of clients: 32
number of threads: 8
duration: 14400 s
number of transactions actually processed: 72008125
latency average = 3.069 ms
latency stddev = 24.640 ms
rate limit schedule lag: avg 1.695 (max 1097.628) ms
tps = 5000.542942 (including connections establishing)
tps = 5000.544213 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set aid1 random_gaussian(1, 100000 * :scale, 4.0)
0.001 \set aid2 random_gaussian(1, 100000 * :scale, 4.5)
0.001 \set aid3 random_gaussian(1, 100000 * :scale, 4.2)
0.001 \set bid random(1, 1 * :scale)
0.001 \set tid random(1, 10 * :scale)
0.001 \set delta random(-5000, 5000)
0.078 BEGIN;
0.560 UPDATE pgbench_accounts SET abalance = abalance +
:delta WHERE aid = :aid1;
0.320 SELECT abalance FROM pgbench_accounts WHERE aid = :aid2;
0.308 SELECT abalance FROM pgbench_accounts WHERE aid = :aid3;
0.102 END;

So even the UPDATE is much faster here.

This is also something we see with pg_statio_tables, which looked like
this by the end for patch:

-[ RECORD 1 ]---+-----------------
schemaname | public
relname | pgbench_accounts
heap_blks_read | 117,384,599
heap_blks_hit | 1,051,175,835
idx_blks_read | 24,761,513
idx_blks_hit | 4,024,776,723

For the patch:

-[ RECORD 1 ]---+-----------------
schemaname | public
relname | pgbench_accounts
heap_blks_read | 191,947,522
heap_blks_hit | 904,536,584
idx_blks_read | 65,653,885
idx_blks_hit | 4,002,061,803

Notice that heap_blks_read is down from 191,947,522 on master, to
117,384,599 with the patch -- so it's ~0.611x with the patch. A huge
reduction like this is possible with the patch because it effectively
amortizes the cost of accessing heap blocks to find garbage to clean
up ("nipping the [index bloat] problem in the bud" is much cheaper
than letting it get out of hand for many reasons, locality in
shared_buffers is one more reason). The patch accesses garbage tuples
in heap blocks close together in time for all indexes, at a point in
time when the blocks are still likely to be found in shared_buffers.

Also notice that idx_blks_read is ~0.38x with the patch. That's less
important, but still significant.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2020-11-12 23:39:44 Re: recovery_target immediate timestamp
Previous Message Bruce Momjian 2020-11-12 23:01:02 Re: Add important info about ANALYZE after create Functional Index