Re: VACUUM memory management

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, k(dot)jamison(at)fujitsu(dot)com, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: VACUUM memory management
Date: 2020-04-05 00:22:43
Message-ID: 20200405002242.GC12283@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 03, 2020 at 09:04:34PM +0500, Ibrar Ahmed wrote:
> Here is the latest patch rebased with master
> (19db23bcbda99e93321cb0636677ec9c6e121a2a) Fri Apr 3 12:20:42 2020. Patch
> fix all the issues, after the parallel vacuum patch. The patch works in
> case of a non-parallel option and allocates memory in chunks.

This patch seems to break vacuuming. On unpatched, master it scans the index:

|postgres=# DROP TABLE t; CREATE UNLOGGED TABLE t AS SELECT generate_series(1,499999)a; CREATE INDEX ON t(a); SET maintenance_work_mem='1024kB'; UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t
|...
|INFO: vacuuming "public.t"
|INFO: scanned index "t_a_idx" to remove 174472 row versions
|...
|INFO: index "t_a_idx" now contains 499999 row versions in 4119 pages
|DETAIL: 499999 index row versions were removed.
|...
|INFO: "t": found 499999 removable, 499999 nonremovable row versions in 4425 out of 4425 pages

With this patch, if chunks are in use, it doesn't scan the indexes. Also, the
table is continuously growing, which means the heap vacuum is broken, too:
public | t | table | pryzbyj | unlogged | 35 MB |
public | t | table | pryzbyj | unlogged | 47 MB |
public | t | table | pryzbyj | unlogged | 59 MB |
public | t | table | pryzbyj | unlogged | 73 MB |

If chunks *aren't* in use (note smaller table), it looks like at least the
displayed output is wrong for "row versions":
|template1=# DROP TABLE t; CREATE UNLOGGED TABLE t AS SELECT generate_series(1,199999)a; CREATE INDEX ON t(a); SET maintenance_work_mem='1024kB'; UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t
|...
|UPDATE 199999
|INFO: vacuuming "public.t"
|INFO: index "t_a_idx" now contains 0 row versions in 1099 pages
|DETAIL: 0 index row versions were removed.

There's some warnings:
vacuumlazy.c:2882:1: warning: ‘lazy_space_dealloc’ defined but not used [-Wunused-function]
vacuumlazy.c:1883:5: warning: ‘tupindex’ may be used uninitialized in this function [-Wmaybe-uninitialized]

When you call lazy_vacuum_all_indexes() and then lazy_vacuum_heap(), you set
vacrelstats->num_chunks = 0; but not vacrelstats->num_chunks = 0;. It seems to
me, that means that only the *first* iteration of lazy_vacuum_* benefits from
the chunks, and the 2nd and later iterations not only don't benefit, but impose the cost
of an index scan for each chunk, all but the last of which have nothing to do.

The patch is rebased on top of parallel vacuum implementation, but doesn't
allow parallel vacuum to use "chunks", right ? I think it's important to
handle parallel vacuum somehow, which is the default for manual (but not auto)
vacuum.

I think max_tuples should be a property of LVRelStats rather than LVDeadTuples,
since it doesn't vary by chunk. Also, there's a weird thing which seems to be
for initializing new chunks, but happens even if you didn't add a chunk, and
then ends up setting the variable to itself:
|int maxtuples = dead_tuples[num_chunks]->max_tuples;
|...
|dead_tuples[num_chunks]->max_tuples = maxtuples;

Maybe num_tuples should be in LVRelStats, too, indicating the number of tuples
for dead_tuples[num_chunks]->itemptrs. Looks like that's how it was before
parallel vacuum. I think now those would need to be in LVShared, too.
Then, LVDeadTuples is nothing but a pointer. Maybe that would simplify
supporting this for parallel vacuum.

Also, num_tuples is a bad name, since it's also a local variable in
lazy_scan_heap():
| double num_tuples, /* total number of nonremovable tuples */

The patch changes to iterate N times over the indexes in lazy_vacuum_index.
That can be pretty expensive. If there are many dead tuples, this patch starts
with a small memory allocation and dynamically increases, but at the cost of
doing a multiple as much I/O. It's seems to be a bad tradeoff: if there's 10
chunks, the first index scan will only handle 10% of what's needed.

I wonder if it's possible to make bsearch() handle the list of lists, to allow
doing a single index scan per iteration, rather than num_batches. I don't
think it accesses the pointers themselves, but rather just calls the callback.
You could make the callback find offset1 and offset2, and compute which chunk
each is in, and the offset within the chunk, and then do the comparison. Maybe
that's too clever and we should just include our own bsearch().

vacuum only runs in parallel for index vacuum and cleanup (but not heap scan or
heap vacuum). Right now, dead_tuples[0] is allocated in
begin_parallel_vacuum(), which is called at the beginning of lazy_scan_heap().
I guess it's not possible to dynamically resize that, but is there any reason
you can't destroy it and recreate it as needed during heap scan? I guess one
reason is that we want to avoid: 1) allocating a new DSM segment of size 2*N,
in addition to the existing one of size N, then copy the original allocation to
the new allocation, then destroy the original. That means we have max memory
use of 3*N, not just 2*N :( Maybe overcommit/lazy allocation by the OS means
that's not always true... One way to do it would if if you gather N dead
tuples, then trigger an index/heap vacuum, then destroy the dead_tuples and
allocate a new one of twice the size (but no need to copy the old one). That
still incurs the cost of multiple (additional) index scans during the early
iterations when you have a small allocation, which isn't great.

Marking this patch as RWF.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-05 01:26:53 Re: Ltree syntax improvement
Previous Message Kartyshov Ivan 2020-04-04 23:56:31 Re: [HACKERS] make async slave to wait for lsn to be replayed