Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-03-19 04:31:04
Message-ID: CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 18, 2019 at 7:06 PM Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hello.
>
> At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC6bsM0FfePgzSV40uXofbFSPe-Ax095TOnu5GOZ790uA(at)mail(dot)gmail(dot)com>
> > Here is the performance test results. I've setup a 500MB table with
> > several indexes and made 10% of table dirty before each vacuum.
> > Compared execution time of the patched postgrse with the current HEAD
> > (at 'speed_up' column). In my environment,
> >
> > indexes | parallel_degree | patched | head | speed_up
> > ---------+-----------------+------------+------------+----------
> > 0 | 0 | 238.2085 | 244.7625 | 1.0275
> > 0 | 1 | 237.7050 | 244.7625 | 1.0297
> > 0 | 2 | 238.0390 | 244.7625 | 1.0282
> > 0 | 4 | 238.1045 | 244.7625 | 1.0280
> > 0 | 8 | 237.8995 | 244.7625 | 1.0288
> > 0 | 16 | 237.7775 | 244.7625 | 1.0294
> > 1 | 0 | 1328.8590 | 1334.9125 | 1.0046
> > 1 | 1 | 1325.9140 | 1334.9125 | 1.0068
> > 1 | 2 | 1333.3665 | 1334.9125 | 1.0012
> > 1 | 4 | 1329.5205 | 1334.9125 | 1.0041
> > 1 | 8 | 1334.2255 | 1334.9125 | 1.0005
> > 1 | 16 | 1335.1510 | 1334.9125 | 0.9998
> > 2 | 0 | 2426.2905 | 2427.5165 | 1.0005
> > 2 | 1 | 1416.0595 | 2427.5165 | 1.7143
> > 2 | 2 | 1411.6270 | 2427.5165 | 1.7197
> > 2 | 4 | 1411.6490 | 2427.5165 | 1.7196
> > 2 | 8 | 1410.1750 | 2427.5165 | 1.7214
> > 2 | 16 | 1413.4985 | 2427.5165 | 1.7174
> > 4 | 0 | 4622.5060 | 4619.0340 | 0.9992
> > 4 | 1 | 2536.8435 | 4619.0340 | 1.8208
> > 4 | 2 | 2548.3615 | 4619.0340 | 1.8126
> > 4 | 4 | 1467.9655 | 4619.0340 | 3.1466
> > 4 | 8 | 1486.3155 | 4619.0340 | 3.1077
> > 4 | 16 | 1481.7150 | 4619.0340 | 3.1174
> > 8 | 0 | 9039.3810 | 8990.4735 | 0.9946
> > 8 | 1 | 4807.5880 | 8990.4735 | 1.8701
> > 8 | 2 | 3786.7620 | 8990.4735 | 2.3742
> > 8 | 4 | 2924.2205 | 8990.4735 | 3.0745
> > 8 | 8 | 2684.2545 | 8990.4735 | 3.3493
> > 8 | 16 | 2672.9800 | 8990.4735 | 3.3635
> > 16 | 0 | 17821.4715 | 17740.1300 | 0.9954
> > 16 | 1 | 9318.3810 | 17740.1300 | 1.9038
> > 16 | 2 | 7260.6315 | 17740.1300 | 2.4433
> > 16 | 4 | 5538.5225 | 17740.1300 | 3.2030
> > 16 | 8 | 5368.5255 | 17740.1300 | 3.3045
> > 16 | 16 | 5291.8510 | 17740.1300 | 3.3523
> > (36 rows)
>
> For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> almost the same. I suspect that the indexes are too-small and all
> the index pages were on memory and CPU is saturated. Maybe you
> had four cores and parallel workers more than the number had no
> effect. Other normal backends should have been able do almost
> nothing meanwhile. Usually the number of parallel workers is
> determined so that IO capacity is filled up but this feature
> intermittently saturates CPU capacity very under such a
> situation.
>

I'm sorry I didn't make it clear enough. If the parallel degree is
higher than 'the number of indexes - 1' redundant workers are not
launched. So for indexes=4, 8, 16 the number of actually launched
parallel workers is up to 3, 7, 15 respectively. That's why the result
shows almost the same execution time in the cases where nindexes <=
parallel_degree.

I'll share the performance test result of more larger tables and indexes.

> I'm not sure, but what if we do index vacuum in one-tuple-by-one
> manner? That is, heap vacuum passes dead tuple one-by-one (or
> buffering few tuples) to workers and workers process it not by
> bulkdelete, but just tuple_delete (we don't have one). That could
> avoid the sleep time of heap-scan while index bulkdelete.
>

Just to be clear, in parallel lazy vacuum all parallel vacuum
processes including the leader process do index vacuuming, no one
doesn't sleep during index vacuuming. The leader process does heap
scan and launches parallel workers before index vacuuming. Each
processes exclusively processes indexes one by one.

Such index deletion method could be an optimization but I'm not sure
that the calling tuple_delete many times would be faster than one
bulkdelete. If there are many dead tuples vacuum has to call
tuple_delete as much as dead tuples. In general one seqscan is faster
than tons of indexscan. There is the proposal for such one by one
index deletions[1] but it's not a replacement of bulkdelete.

>
> > Attached the updated version patches. The patches apply to the current
> > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > Node since it's under the discussion. After the direction has been
> > decided, I'll update the patches.
>
> As for the to-be-or-not-to-be a node problem, I don't think it is
> needed but from the point of consistency, it seems reasonable and
> it is seen in other nodes that *Stmt Node holds option Node. But
> makeVacOpt and it's usage, and subsequent operations on the node
> look somewhat strange.. Why don't you just do
> "makeNode(VacuumOptions)"?

Thank you for the comment but this part has gone away as the recent
commit changed the grammar production of vacuum command.

>
>
> >+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> >+ maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
>
> If I understand this correctly, nindexes is always > 1 there. At
> lesat asserted that > 0 there.
>
> >+ estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
>
> I don't think the name is good. (dt menant detach by the first look for me..)

Fixed.

>
> >+ if (lps->nworkers_requested > 0)
> >+ appendStringInfo(&buf,
> >+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)",
>
> "planned"?

The 'planned' shows how many parallel workers we planned to launch.
The degree of parallelism is determined based on either user request
or the number of indexes that the table has.

>
>
> >+ /* Get the next index to vacuum */
> >+ if (do_parallel)
> >+ idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> >+ else
> >+ idx = nprocessed++;
>
> It seems that both of the two cases can be handled using
> LVParallelState and most of the branches by lps or do_parallel
> can be removed.
>

Sorry I couldn't get your comment. You meant to move nprocessed to
LVParallelState?

[1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2019-03-19 04:56:03 Re: extensions are hitting the ceiling
Previous Message Haribabu Kommi 2019-03-19 04:25:44 Re: What to name the current heap after pluggable storage / what to rename?