Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-02-23 09:40:58
Message-ID: CAFBsxsHrvTPUK=C1=xweJjGujja4Xjfgva3C8jnW3Shz6RBnFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I ran a couple "in situ" tests on server hardware using UUID columns, since
they are common in the real world and have bad correlation to heap
order, so are a challenge for index vacuum.

=== test 1, delete everything from a small table, with very small
maintenance_work_mem:

alter system set shared_buffers ='4GB';
alter system set max_wal_size ='10GB';
alter system set checkpoint_timeout ='30 min';
alter system set autovacuum =off;

-- unrealistically low
alter system set maintenance_work_mem = '32MB';

create table if not exists test (x uuid);
truncate table test;
insert into test (x) select gen_random_uuid() from
generate_series(1,50*1000*1000);
create index on test (x);

delete from test;
vacuum (verbose, truncate off) test;
--

master:
INFO: finished vacuuming "john.naylor.public.test": index scans: 9
system usage: CPU: user: 70.04 s, system: 19.85 s, elapsed: 802.06 s

v29 patch:
INFO: finished vacuuming "john.naylor.public.test": index scans: 1
system usage: CPU: user: 9.80 s, system: 2.62 s, elapsed: 36.68 s

This is a bit artificial, but it's easy to construct cases where the array
leads to multiple index scans but the new tid store can fit everythin
without breaking a sweat. I didn't save the progress reporting, but v29 was
using about 11MB for tid storage.

=== test 2: try to stress tid lookup with production maintenance_work_mem:
1. use unlogged table to reduce noise
2. vacuum freeze first to reduce heap scan time
3. delete some records at the beginning and end of heap to defeat binary
search's pre-check

alter system set shared_buffers ='4GB';
alter system set max_wal_size ='10GB';
alter system set checkpoint_timeout ='30 min';
alter system set autovacuum =off;

alter system set maintenance_work_mem = '1GB';

create unlogged table if not exists test (x uuid);
truncate table test;
insert into test (x) select gen_random_uuid() from
generate_series(1,1000*1000*1000);
vacuum_freeze test;

select pg_size_pretty(pg_table_size('test'));
pg_size_pretty
----------------
41 GB

create index on test (x);

select pg_size_pretty(pg_total_relation_size('test'));
pg_size_pretty
----------------
71 GB

select max(ctid) from test;
max
--------------
(5405405,75)

delete from test where ctid < '(100000,0)'::tid;
delete from test where ctid > '(5300000,0)'::tid;

vacuum (verbose, truncate off) test;

both:
INFO: vacuuming "john.naylor.public.test"
INFO: finished vacuuming "john.naylor.public.test": index scans: 1
index scan needed: 205406 pages from table (3.80% of total) had 38000000
dead item identifiers removed

--
master:
system usage: CPU: user: 134.32 s, system: 19.24 s, elapsed: 286.14 s

v29 patch:
system usage: CPU: user: 97.71 s, system: 45.78 s, elapsed: 573.94 s

The entire vacuum took 25% less wall clock time. Reminder that this is
without wal logging, and also unscientific because only one run.

--
I took 10 seconds of perf data while index vacuuming was going on (showing
calls > 2%):

master:
40.59% postgres postgres [.] vac_cmp_itemptr
24.97% postgres libc-2.17.so [.] bsearch
6.67% postgres postgres [.] btvacuumpage
4.61% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string
3.48% postgres postgres [.] PageIndexMultiDelete
2.67% postgres postgres [.] vac_tid_reaped
2.03% postgres postgres [.] compactify_tuples
2.01% postgres libc-2.17.so [.] __memcpy_ssse3_back

v29 patch:

29.22% postgres postgres [.] TidStoreIsMember
9.30% postgres postgres [.] btvacuumpage
7.76% postgres postgres [.] PageIndexMultiDelete
6.31% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string
5.60% postgres postgres [.] compactify_tuples
4.26% postgres libc-2.17.so [.] __memcpy_ssse3_back
4.12% postgres postgres [.] hash_search_with_hash_value

--
master:
psql -c "select phase, heap_blks_total, heap_blks_scanned, max_dead_tuples,
num_dead_tuples from pg_stat_progress_vacuum"
phase | heap_blks_total | heap_blks_scanned | max_dead_tuples
| num_dead_tuples
-------------------+-----------------+-------------------+-----------------+-----------------
vacuuming indexes | 5405406 | 5405406 | 178956969
| 38000000

v29 patch:
psql -c "select phase, heap_blks_total, heap_blks_scanned,
max_dead_tuple_bytes, dead_tuple_bytes from pg_stat_progress_vacuum"
phase | heap_blks_total | heap_blks_scanned |
max_dead_tuple_bytes | dead_tuple_bytes
-------------------+-----------------+-------------------+----------------------+------------------
vacuuming indexes | 5405406 | 5405406 |
1073670144 | 8678064

Here, the old array pessimistically needs 1GB allocated (as for any table >
~5GB), but only fills 228MB for tid lookup. The patch reports 8.7MB. Tables
that only fit, say, 30-50 tuples per page will have less extreme
differences in memory use. Same for the case where only a couple dead items
occur per page, with many uninteresting pages in between. Even so, the
allocation will be much more accurately sized in the patch, especially in
non-parallel vacuum.

There are other cases that could be tested (I mentioned some above), but
this is enough to show the improvements possible.

I still need to do some cosmetic follow-up to v29 as well as a status
report, and I will try to get back to that soon.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-02-23 10:37:58 Re: buildfarm + meson
Previous Message Richard Guo 2023-02-23 09:37:44 Re: Wrong query results caused by loss of join quals