Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-02-24 08:40:27
Message-ID: CAD21AoDAKmUG6LFvbzWqEgvXZBgYoap2yL74N8QkVgzg1cW-Tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 23, 2023 at 6:41 PM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
> I ran a couple "in situ" tests on server hardware using UUID columns, since they are common in the real world and have bad correlation to heap order, so are a challenge for index vacuum.

Thank you for the test!

>
> === test 1, delete everything from a small table, with very small maintenance_work_mem:
>
> alter system set shared_buffers ='4GB';
> alter system set max_wal_size ='10GB';
> alter system set checkpoint_timeout ='30 min';
> alter system set autovacuum =off;
>
> -- unrealistically low
> alter system set maintenance_work_mem = '32MB';
>
> create table if not exists test (x uuid);
> truncate table test;
> insert into test (x) select gen_random_uuid() from generate_series(1,50*1000*1000);
> create index on test (x);
>
> delete from test;
> vacuum (verbose, truncate off) test;
> --
>
> master:
> INFO: finished vacuuming "john.naylor.public.test": index scans: 9
> system usage: CPU: user: 70.04 s, system: 19.85 s, elapsed: 802.06 s
>
> v29 patch:
> INFO: finished vacuuming "john.naylor.public.test": index scans: 1
> system usage: CPU: user: 9.80 s, system: 2.62 s, elapsed: 36.68 s
>
> This is a bit artificial, but it's easy to construct cases where the array leads to multiple index scans but the new tid store can fit everythin without breaking a sweat. I didn't save the progress reporting, but v29 was using about 11MB for tid storage.

Cool.

>
>
> === test 2: try to stress tid lookup with production maintenance_work_mem:
> 1. use unlogged table to reduce noise
> 2. vacuum freeze first to reduce heap scan time
> 3. delete some records at the beginning and end of heap to defeat binary search's pre-check
>
> alter system set shared_buffers ='4GB';
> alter system set max_wal_size ='10GB';
> alter system set checkpoint_timeout ='30 min';
> alter system set autovacuum =off;
>
> alter system set maintenance_work_mem = '1GB';
>
> create unlogged table if not exists test (x uuid);
> truncate table test;
> insert into test (x) select gen_random_uuid() from generate_series(1,1000*1000*1000);
> vacuum_freeze test;
>
> select pg_size_pretty(pg_table_size('test'));
> pg_size_pretty
> ----------------
> 41 GB
>
> create index on test (x);
>
> select pg_size_pretty(pg_total_relation_size('test'));
> pg_size_pretty
> ----------------
> 71 GB
>
> select max(ctid) from test;
> max
> --------------
> (5405405,75)
>
> delete from test where ctid < '(100000,0)'::tid;
> delete from test where ctid > '(5300000,0)'::tid;
>
> vacuum (verbose, truncate off) test;
>
> both:
> INFO: vacuuming "john.naylor.public.test"
> INFO: finished vacuuming "john.naylor.public.test": index scans: 1
> index scan needed: 205406 pages from table (3.80% of total) had 38000000 dead item identifiers removed
>
> --
> master:
> system usage: CPU: user: 134.32 s, system: 19.24 s, elapsed: 286.14 s
>
> v29 patch:
> system usage: CPU: user: 97.71 s, system: 45.78 s, elapsed: 573.94 s

In v29 vacuum took twice as long (286 s vs. 573 s)?

>
> The entire vacuum took 25% less wall clock time. Reminder that this is without wal logging, and also unscientific because only one run.
>
> --
> I took 10 seconds of perf data while index vacuuming was going on (showing calls > 2%):
>
> master:
> 40.59% postgres postgres [.] vac_cmp_itemptr
> 24.97% postgres libc-2.17.so [.] bsearch
> 6.67% postgres postgres [.] btvacuumpage
> 4.61% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string
> 3.48% postgres postgres [.] PageIndexMultiDelete
> 2.67% postgres postgres [.] vac_tid_reaped
> 2.03% postgres postgres [.] compactify_tuples
> 2.01% postgres libc-2.17.so [.] __memcpy_ssse3_back
>
> v29 patch:
>
> 29.22% postgres postgres [.] TidStoreIsMember
> 9.30% postgres postgres [.] btvacuumpage
> 7.76% postgres postgres [.] PageIndexMultiDelete
> 6.31% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string
> 5.60% postgres postgres [.] compactify_tuples
> 4.26% postgres libc-2.17.so [.] __memcpy_ssse3_back
> 4.12% postgres postgres [.] hash_search_with_hash_value
>
> --
> master:
> psql -c "select phase, heap_blks_total, heap_blks_scanned, max_dead_tuples, num_dead_tuples from pg_stat_progress_vacuum"
> phase | heap_blks_total | heap_blks_scanned | max_dead_tuples | num_dead_tuples
> -------------------+-----------------+-------------------+-----------------+-----------------
> vacuuming indexes | 5405406 | 5405406 | 178956969 | 38000000
>
> v29 patch:
> psql -c "select phase, heap_blks_total, heap_blks_scanned, max_dead_tuple_bytes, dead_tuple_bytes from pg_stat_progress_vacuum"
> phase | heap_blks_total | heap_blks_scanned | max_dead_tuple_bytes | dead_tuple_bytes
> -------------------+-----------------+-------------------+----------------------+------------------
> vacuuming indexes | 5405406 | 5405406 | 1073670144 | 8678064
>
> Here, the old array pessimistically needs 1GB allocated (as for any table > ~5GB), but only fills 228MB for tid lookup. The patch reports 8.7MB. Tables that only fit, say, 30-50 tuples per page will have less extreme differences in memory use. Same for the case where only a couple dead items occur per page, with many uninteresting pages in between. Even so, the allocation will be much more accurately sized in the patch, especially in non-parallel vacuum.

Agreed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-02-24 08:42:12 Re: Support logical replication of DDLs
Previous Message Alvaro Herrera 2023-02-24 08:39:16 Re: PATCH: Using BRIN indexes for sorted output