Re: [HACKERS] Block level parallel vacuum

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: kommi(dot)haribabu(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, david(at)pgmasters(dot)net, klaussfreire(at)gmail(dot)com, simon(at)2ndquadrant(dot)com, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-03-18 10:06:35
Message-ID: 20190318.190635.71361340.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC6bsM0FfePgzSV40uXofbFSPe-Ax095TOnu5GOZ790uA(at)mail(dot)gmail(dot)com>
> Here is the performance test results. I've setup a 500MB table with
> several indexes and made 10% of table dirty before each vacuum.
> Compared execution time of the patched postgrse with the current HEAD
> (at 'speed_up' column). In my environment,
>
> indexes | parallel_degree | patched | head | speed_up
> ---------+-----------------+------------+------------+----------
> 0 | 0 | 238.2085 | 244.7625 | 1.0275
> 0 | 1 | 237.7050 | 244.7625 | 1.0297
> 0 | 2 | 238.0390 | 244.7625 | 1.0282
> 0 | 4 | 238.1045 | 244.7625 | 1.0280
> 0 | 8 | 237.8995 | 244.7625 | 1.0288
> 0 | 16 | 237.7775 | 244.7625 | 1.0294
> 1 | 0 | 1328.8590 | 1334.9125 | 1.0046
> 1 | 1 | 1325.9140 | 1334.9125 | 1.0068
> 1 | 2 | 1333.3665 | 1334.9125 | 1.0012
> 1 | 4 | 1329.5205 | 1334.9125 | 1.0041
> 1 | 8 | 1334.2255 | 1334.9125 | 1.0005
> 1 | 16 | 1335.1510 | 1334.9125 | 0.9998
> 2 | 0 | 2426.2905 | 2427.5165 | 1.0005
> 2 | 1 | 1416.0595 | 2427.5165 | 1.7143
> 2 | 2 | 1411.6270 | 2427.5165 | 1.7197
> 2 | 4 | 1411.6490 | 2427.5165 | 1.7196
> 2 | 8 | 1410.1750 | 2427.5165 | 1.7214
> 2 | 16 | 1413.4985 | 2427.5165 | 1.7174
> 4 | 0 | 4622.5060 | 4619.0340 | 0.9992
> 4 | 1 | 2536.8435 | 4619.0340 | 1.8208
> 4 | 2 | 2548.3615 | 4619.0340 | 1.8126
> 4 | 4 | 1467.9655 | 4619.0340 | 3.1466
> 4 | 8 | 1486.3155 | 4619.0340 | 3.1077
> 4 | 16 | 1481.7150 | 4619.0340 | 3.1174
> 8 | 0 | 9039.3810 | 8990.4735 | 0.9946
> 8 | 1 | 4807.5880 | 8990.4735 | 1.8701
> 8 | 2 | 3786.7620 | 8990.4735 | 2.3742
> 8 | 4 | 2924.2205 | 8990.4735 | 3.0745
> 8 | 8 | 2684.2545 | 8990.4735 | 3.3493
> 8 | 16 | 2672.9800 | 8990.4735 | 3.3635
> 16 | 0 | 17821.4715 | 17740.1300 | 0.9954
> 16 | 1 | 9318.3810 | 17740.1300 | 1.9038
> 16 | 2 | 7260.6315 | 17740.1300 | 2.4433
> 16 | 4 | 5538.5225 | 17740.1300 | 3.2030
> 16 | 8 | 5368.5255 | 17740.1300 | 3.3045
> 16 | 16 | 5291.8510 | 17740.1300 | 3.3523
> (36 rows)

For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
almost the same. I suspect that the indexes are too-small and all
the index pages were on memory and CPU is saturated. Maybe you
had four cores and parallel workers more than the number had no
effect. Other normal backends should have been able do almost
nothing meanwhile. Usually the number of parallel workers is
determined so that IO capacity is filled up but this feature
intermittently saturates CPU capacity very under such a
situation.

I'm not sure, but what if we do index vacuum in one-tuple-by-one
manner? That is, heap vacuum passes dead tuple one-by-one (or
buffering few tuples) to workers and workers process it not by
bulkdelete, but just tuple_delete (we don't have one). That could
avoid the sleep time of heap-scan while index bulkdelete.

> Attached the updated version patches. The patches apply to the current
> HEAD cleanly but the 0001 patch still changes the vacuum option to a
> Node since it's under the discussion. After the direction has been
> decided, I'll update the patches.

As for the to-be-or-not-to-be a node problem, I don't think it is
needed but from the point of consistency, it seems reasonable and
it is seen in other nodes that *Stmt Node holds option Node. But
makeVacOpt and it's usage, and subsequent operations on the node
look somewhat strange.. Why don't you just do
"makeNode(VacuumOptions)"?

>+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
>+ maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);

If I understand this correctly, nindexes is always > 1 there. At
lesat asserted that > 0 there.

>+ estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),

I don't think the name is good. (dt menant detach by the first look for me..)

>+ if (lps->nworkers_requested > 0)
>+ appendStringInfo(&buf,
>+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)",

"planned"?

>+ /* Get the next index to vacuum */
>+ if (do_parallel)
>+ idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
>+ else
>+ idx = nprocessed++;

It seems that both of the two cases can be handled using
LVParallelState and most of the branches by lps or do_parallel
can be removed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-03-18 10:15:12 Re: pg_basebackup ignores the existing data directory permissions
Previous Message Amit Langote 2019-03-18 09:56:15 Re: speeding up planning with partitions