Re: [HACKERS] Block level parallel vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2018-12-20 06:37:49
Message-ID: CAA4eK1KNC+---aU3u4=amxZs5qA7xDbKNMwXe3RSrhNpMAhPSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Attached the updated patches. I scaled back the scope of this patch.
> The patch now includes only feature (a), that is it execute both index
> vacuum and cleanup index in parallel. It also doesn't include
> autovacuum support for now.
>
> The PARALLEL option works alomst same as before patch. In VACUUM
> command, we can specify 'PARALLEL n' option where n is the number of
> parallel workers to request. If the n is omitted the number of
> parallel worekrs would be # of indexes -1.
>

I think for now this is okay, but I guess in furture when we make
heapscans also parallel or maybe allow more than one worker per-index
vacuum, then this won't hold good. So, I am not sure if below text in
docs is most appropriate.

+ <term><literal>PARALLEL <replaceable
class="parameter">N</replaceable></literal></term>
+ <listitem>
+ <para>
+ Execute index vacuum and cleanup index in parallel with
+ <replaceable class="parameter">N</replaceable> background
workers. If the parallel
+ degree <replaceable class="parameter">N</replaceable> is omitted,
+ <command>VACUUM</command> requests the number of indexes - 1
processes, which is the
+ maximum number of parallel vacuum workers since individual
indexes is processed by
+ one process. The actual number of parallel vacuum workers may
be less due to the
+ setting of <xref linkend="guc-max-parallel-workers-maintenance"/>.
+ This option can not use with <literal>FULL</literal> option.

It might be better to use some generic language in docs, something
like "If the parallel degree N is omitted, then vacuum decides the
number of workers based on number of indexes on the relation which is
further limited by max-parallel-workers-maintenance". I think you
also need to mention in some way that you consider storage option
parallel_workers.

Few assorted comments:
1.
+lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup)
{
..
+
+ LaunchParallelWorkers(lvstate->pcxt);
+
+ /*
+ * if no workers launched, we vacuum all indexes by the leader process
+ * alone. Since there is hope that we can launch workers in the next
+ * execution time we don't want to end the parallel mode yet.
+ */
+ if (lvstate->pcxt->nworkers_launched == 0)
+ return;

It is quite possible that the workers are not launched because we fail
to allocate memory, basically when pcxt->nworkers is zero. I think in
such cases there is no use for being in parallel mode. You can even
detect that before calling lazy_begin_parallel_vacuum_index.

2.
static void
+lazy_vacuum_all_indexes_for_leader(LVState *lvstate,
IndexBulkDeleteResult **stats,
+ LVTidMap *dead_tuples, bool do_parallel,
+ bool for_cleanup)
{
..
+ if (do_parallel)
+ lazy_begin_parallel_vacuum_index(lvstate, for_cleanup);
+
+ for (;;)
+ {
+ IndexBulkDeleteResult *r = NULL;
+
+ /*
+ * Get the next index number to vacuum and set index statistics. In parallel
+ * lazy vacuum, index bulk-deletion results are stored in the shared memory
+ * segment. If it's already updated we use it rather than setting it to NULL.
+ * In single vacuum, we can always use an element of the 'stats'.
+ */
+ if (do_parallel)
+ {
+ idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
+
+ if (lvshared->indstats[idx].updated)
+ r = &(lvshared->indstats[idx].stats);
+ }

It is quite possible that we are not able to launch any workers in
lazy_begin_parallel_vacuum_index, in such cases, we should not use
parallel mode path, basically as written we can't rely on
'do_parallel' flag.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-12-20 06:44:10 Re: Switching to 64-bit Bitmapsets
Previous Message David Rowley 2018-12-20 05:53:23 Using POPCNT and other advanced bit manipulation instructions