Re: [HACKERS] Block level parallel vacuum

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-01-30 01:06:11
Message-ID: CAJrrPGdhBhp1FApSeoD15UmcAPMc-ML+k-OCgT1GNCRnU0kuWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 24, 2019 at 1:16 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

>
> Attached the latest patches.
>

Thanks for the updated patches.
Some more code review comments.

+ started by a single utility command. Currently, the parallel
+ utility commands that support the use of parallel workers are
+ <command>CREATE INDEX</command> and <command>VACUUM</command>
+ without <literal>FULL</literal> option, and only when building
+ a B-tree index. Parallel workers are taken from the pool of

I feel the above sentence may not give the proper picture, how about the
adding following modification?

<command>CREATE INDEX</command> only when building a B-tree index
and <command>VACUUM</command> without <literal>FULL</literal> option.

+ * parallel vacuum, we perform both index vacuum and index cleanup in
parallel.
+ * Individual indexes is processed by one vacuum process. At beginning of

How about vacuum index and cleanup index similar like other places?

+ * memory space for dead tuples. When starting either index vacuum or
cleanup
+ * vacuum, we launch parallel worker processes. Once all indexes are
processed

same here as well?

+ * Before starting parallel index vacuum and parallel cleanup index we
launch
+ * parallel workers. All parallel workers will exit after processed all
indexes

parallel vacuum index and parallel cleanup index?

+ /*
+ * If there is already-updated result in the shared memory we
+ * use it. Otherwise we pass NULL to index AMs and copy the
+ * result to the shared memory segment.
+ */
+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);

I didn't really find a need of the flag to differentiate the stats pointer
from
first run to second run? I don't see any problem in passing directing the
stats
and the same stats are updated in the worker side and leader side. Anyway
no two
processes will do the index vacuum at same time. Am I missing something?

Even if this flag is to identify whether the stats are updated or not before
writing them, I don't see a need of it compared to normal vacuum.

+ * Enter the parallel mode, allocate and initialize a DSM segment. Return
+ * the memory space for storing dead tuples or NULL if no workers are
prepared.
+ */

+ pcxt = CreateParallelContext("postgres", "heap_parallel_vacuum_main",
+ request, true);

But we are passing as serializable_okay flag as true, means it doesn't
return
NULL. Is it expected?

+ initStringInfo(&buf);
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker %s (planned: %d",
+ "launched %d parallel vacuum workers %s (planned: %d",
+ lvstate->pcxt->nworkers_launched),
+ lvstate->pcxt->nworkers_launched,
+ for_cleanup ? "for index cleanup" : "for index vacuum",
+ lvstate->pcxt->nworkers);
+ if (lvstate->options.nworkers > 0)
+ appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);

what is the difference between planned workers and requested workers,
aren't both
are same?

- COMPARE_SCALAR_FIELD(options);
- COMPARE_NODE_FIELD(rels);
+ if (a->options.flags != b->options.flags)
+ return false;
+ if (a->options.nworkers != b->options.nworkers)
+ return false;

Options is changed from SCALAR to check, but why the rels check is removed?
The options is changed from int to a structure so using SCALAR may not work
in other function like _copyVacuumStmt and etc?

+typedef struct VacuumOptions
+{
+ VacuumFlag flags; /* OR of VacuumFlag */
+ int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;

Do we need to add NodeTag for the above structure? Because this structure is
part of VacuumStmt structure.

+ <application>vacuumdb</application> will require background
workers,
+ so make sure your <xref
linkend="guc-max-parallel-workers-maintenance"/>
+ setting is more than one.

removing vacuumdb and changing it as "This option will ..."?

I will continue the testing of this patch and share the details.

Regards,
Haribabu Kommi
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2019-01-30 01:16:14 Re: Covering GiST indexes
Previous Message Michael Paquier 2019-01-30 00:46:53 Re: A few new options for vacuumdb