Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-03-26 14:31:10
Message-ID: CAD21AoBg8CBf1OAse6ESKJmNBon14h3nAR67nJhZ=yujA+Lk4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 26, 2019 at 10:19 AM Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com> wrote:
>
>
> On Fri, Mar 22, 2019 at 4:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>>
>> Attached the updated version patch. 0001 patch allows all existing
>> vacuum options an boolean argument. 0002 patch introduces parallel
>> lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
>> command.
>
>
> Thanks for sharing the updated patches.

Thank you for reviewing the patch.

>
> 0001 patch:
>
> + PARALLEL [ <replaceable class="parameter">N</replaceable> ]
>
> But this patch contains syntax of PARALLEL but no explanation, I saw that
> it is explained in 0002. It is not a problem, but just mentioning.

Oops, removed it from 0001.

>
> + Specifies parallel degree for <literal>PARALLEL</literal> option. The
> + value must be at least 1. If the parallel degree
> + <replaceable class="parameter">integer</replaceable> is omitted, then
> + <command>VACUUM</command> decides the number of workers based on number of
> + indexes on the relation which further limited by
> + <xref linkend="guc-max-parallel-workers-maintenance"/>.
>
> Can we add some more details about backend participation also, parallel workers will
> come into picture only when there are 2 indexes in the table.

Agreed. I've add the description. Since such behavior might change in
a future release I've added such description.

>
> + /*
> + * Do post-vacuum cleanup and statistics update for each index if
> + * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do
> + * only post-vacum cleanup and then update statistics after exited
> + * from parallel mode.
> + */
> + lazy_vacuum_all_indexes(vacrelstats, Irel, nindexes, indstats,
> + lps, true);
>
> How about renaming the above function, as it does the cleanup also?
> lazy_vacuum_or_cleanup_all_indexes?

Agreed. I think lazy_vacuum_or_cleanup_indexes would be better. Also
the same is true for lazy_vacuum_indexes_for_workers(). Fixed.

>
>
> + if (!IsInParallelVacuum(lps))
> + {
> + /*
> + * Update index statistics. If in parallel lazy vacuum, we will
> + * update them after exited from parallel mode.
> + */
> + lazy_update_index_statistics(Irel[idx], stats[idx]);
> +
> + if (stats[idx])
> + pfree(stats[idx]);
> + }
>
> The above check in lazy_vacuum_all_indexes can be combined it with the outer
> if check where the memcpy is happening. I still feel that the logic around the stats
> makes it little bit complex.

Hmm, memcpy is needed in both vacuuming index and cleanup index case
but the updating index statistics is needed in only cleanup index
case. I've split the code for parallel index vacuuming from
lazy_vacuum_or_cleanup_indexes. Also, I've changed the patch so that
both the leader process and worker processes use the same code for
index vacuuming and index cleanup. I hope the code got simple and
understandable.

>
> + if (IsParallelWorker())
> + msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum worker";
> + else
> + msg = "scanned index \"%s\" to remove %d row versions";
>
> I feel, this way of error message may not be picked for the translations.
> Is there any problem if we duplicate the entire ereport message with changed message?

No, but I'd like to avoid writing the same arguments in multiple
ereport()s. I've add gettext_noop() as per comment from Horiguchi-san.

>
> + for (i = 0; i < nindexes; i++)
> + {
> + LVIndStats *s = &(copied_indstats[i]);
> +
> + if (s->updated)
> + lazy_update_index_statistics(Irel[i], &(s->stats));
> + }
> +
> + pfree(copied_indstats);
>
> why can't we use the shared memory directly to update the stats once all the workers
> are finished, instead of copying them to a local memory?

Since we cannot use heap_inplace_update() which is called by
vac_update_relstats() during parallel mode I copied the stats. Is that
safe if we destroy the parallel context *after* exited parallel mode?

>
> + tab->at_params.nworkers = 0; /* parallel lazy autovacuum is not supported */
>
> User is not required to provide workers number compulsory even that parallel vacuum can
> work, so just setting the above parameters doesn't stop the parallel workers, user must
> pass the PARALLEL option also. So mentioning that also will be helpful later when we
> start supporting it or some one who is reading the code can understand.
>

We should set -1 to at_params.nworkers here to disable parallel lazy
vacuum. And this review comment got me thinking that VACOPT_PARALLEL
can be combined with nworkers of VacuumParams. So I've removed
VACOPT_PARALELL and passing nworkers with >= 0 enables parallel lazy
vacuum.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
v20-0001-All-VACUUM-command-options-allow-an-argument.patch application/x-patch 5.9 KB
v20-0002-Add-parallel-option-to-VACUUM-command.patch application/x-patch 53.8 KB
v20-0003-Add-paralell-P-option-to-vacuumdb-command.patch application/x-patch 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-26 14:31:49 Re: jsonpath
Previous Message Tom Lane 2019-03-26 14:21:25 Re: pgsql: Get rid of backtracking in jsonpath_scan.l