Re: [HACKERS] Block level parallel vacuum

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-10-05 07:36:27
Message-ID: CAFiTN-uPDNz9SYr_8rZo7C3K-Vf2cZJa2WUtgApY26Detwf1-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 4, 2019 at 3:35 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >>
> Some more comments..
> 1.
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + if (!for_cleanup)
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> + vacrelstats->old_live_tuples);
> + else
> + {
> + /* Cleanup one index and update index statistics */
> + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples,
> + vacrelstats->tupcount_pages < vacrelstats->rel_pages);
> +
> + lazy_update_index_statistics(Irel[idx], stats[idx]);
> +
> + if (stats[idx])
> + pfree(stats[idx]);
> + }
>
> I think instead of checking for_cleanup variable for every index of
> the loop we better move loop inside, like shown below?
>
> if (!for_cleanup)
> for (idx = 0; idx < nindexes; idx++)
> lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> else
> for (idx = 0; idx < nindexes; idx++)
> {
> lazy_cleanup_index
> lazy_update_index_statistics
> ...
> }
>
> 2.
> +static void
> +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps, bool for_cleanup)
> +{
> + int idx;
> +
> + Assert(!IsParallelWorker());
> +
> + /* no job if the table has no index */
> + if (nindexes <= 0)
> + return;
>
> Wouldn't it be good idea to call this function only if nindexes > 0?
>
> 3.
> +/*
> + * Vacuum or cleanup indexes with parallel workers. This function must be used
> + * by the parallel vacuum leader process.
> + */
> +static void
> +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
> Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps, bool for_cleanup)
>
> If you see this function there is no much common code between
> for_cleanup and without for_cleanup except these 3-4 statement.
> LaunchParallelWorkers(lps->pcxt);
> /* Create the log message to report */
> initStringInfo(&buf);
> ...
> /* Wait for all vacuum workers to finish */
> WaitForParallelWorkersToFinish(lps->pcxt);
>
> Other than that you have got a lot of checks like this
> + if (!for_cleanup)
> + {
> + }
> + else
> + {
> }
>
> I think code would be much redable if we have 2 functions one for
> vaccum (lazy_parallel_vacuum_indexes) and another for
> cleanup(lazy_parallel_cleanup_indexes).
>
> 4.
> * of index scans performed. So we don't use maintenance_work_mem memory for
> * the TID array, just enough to hold as many heap tuples as fit on one page.
> *
> + * Lazy vacuum supports parallel execution with parallel worker processes. In
> + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
> + * parallel worker processes. Individual indexes are processed by one vacuum
>
> Spacing after the "." is not uniform, previous comment is using 2
> space and newly
> added is using 1 space.

Few more comments
----------------------------

1.
+static int
+compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
+{
+ int parallel_workers;
+ bool leaderparticipates = true;

Seems like this function is not using onerel parameter so we can remove this.

2.
+
+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
+ maxtuples = compute_max_dead_tuples(nblocks, true);
+ est_deadtuples = MAXALIGN(add_size(SizeOfLVDeadTuples,
+ mul_size(sizeof(ItemPointerData), maxtuples)));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
+ querylen = strlen(debug_query_string);

for consistency with other comments change
VACUUM_KEY_QUERY_TEXT to PARALLEL_VACUUM_KEY_QUERY_TEXT

3.
@@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
(!wraparound ? VACOPT_SKIP_LOCKED : 0);
tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT;
tab->at_params.truncate = VACOPT_TERNARY_DEFAULT;
+ /* parallel lazy vacuum is not supported for autovacuum */
+ tab->at_params.nworkers = -1;

What is the reason for the same? Can we explain in the comments?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dent John 2019-10-05 10:27:49 Re: The flinfo->fn_extra question, from me this time.
Previous Message Tom Lane 2019-10-05 03:55:21 Re: format of pg_upgrade loadable_libraries warning