Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-10-02 13:58:26
Message-ID: CAD21AoAdue0hHLpPFMicRzfe8KX=CyLa257cWmkAfFoo3-YtOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Since the previous version patch conflicts with current HEAD, I've
> > attached the updated version patches.
> >
>

Thank you for reviewing this patch!

> Review comments:
> ------------------------------
> *
> indexes on the relation which further limited by
> + <xref linkend="guc-max-parallel-workers-maintenance"/>.
>
> /which further/which is further
>

Fixed.

> *
> + * index vacuuming or index cleanup, we launch parallel worker processes. Once
> + * all indexes are processed the parallel worker processes exit and the leader
> + * process re-initializes the DSM segment while keeping recorded dead tuples.
>
> It is not clear for this comment why it re-initializes the DSM segment
> instead of destroying it once the index work is done by workers. Can
> you elaborate a bit more in the comment?

Added more explanation.

>
> *
> + * Note that all parallel workers live during one either index vacuuming or
>
> It seems usage of 'one' is not required in the above sentence.

Removed.

>
> *
> +
> +/*
> + * Compute the number of parallel worker process to request.
>
> /process/processes

Fixed.

>
> *
> +static int
> +compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
> +{
> + int parallel_workers = 0;
> +
> + Assert(nrequested >= 0);
> +
> + if (nindexes <= 1)
> + return 0;
>
> I think here, in the beginning, you can also check if
> max_parallel_maintenance_workers are 0, then return.
>

Agreed, fixed.

> *
> In function compute_parallel_workers, don't we want to cap the number
> of workers based on maintenance_work_mem as we do in
> plan_create_index_workers?
>
> The basic point is how do we want to treat maintenance_work_mem for
> this feature. Do we want all workers to use at max the
> maintenance_work_mem or each worker is allowed to use
> maintenance_work_mem? I would prefer earlier unless we have good
> reason to follow the later strategy.
>
> Accordingly, we might need to update the below paragraph in docs:
> "Note that parallel utility commands should not consume substantially
> more memory than equivalent non-parallel operations. This strategy
> differs from that of parallel query, where resource limits generally
> apply per worker process. Parallel utility commands treat the
> resource limit <varname>maintenance_work_mem</varname> as a limit to
> be applied to the entire utility command, regardless of the number of
> parallel worker processes."

I'd also prefer to use maintenance_work_mem at max during parallel
vacuum regardless of the number of parallel workers. This is current
implementation. In lazy vacuum the maintenance_work_mem is used to
record itempointer of dead tuples. This is done by leader process and
worker processes just refers them for vacuuming dead index tuples.
Even if user sets a small amount of maintenance_work_mem the parallel
vacuum would be helpful as it still would take a time for index
vacuuming. So I thought we should cap the number of parallel workers
by the number of indexes rather than maintenance_work_mem.

>
> *
> +static int
> +compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
> +{
> + int parallel_workers = 0;
> +
> + Assert(nrequested >= 0);
> +
> + if (nindexes <= 1)
> + return 0;
> +
> + if (nrequested > 0)
> + {
> + /* At least one index is taken by the leader process */
> + parallel_workers = Min(nrequested, nindexes - 1);
> + }
>
> I think here we always allow the leader to participate. It seems to
> me we have some way to disable leader participation. During the
> development of previous parallel operations, we find it quite handy to
> catch bugs. We might want to mimic what has been done for index with
> DISABLE_LEADER_PARTICIPATION.

Added the way to disable leader participation.

>
> *
> +/*
> + * DSM keys for parallel lazy vacuum. Unlike other parallel execution code,
> + * since we don't need to worry about DSM keys conflicting with plan_node_id
> + * we can use small integers.
> + */
> +#define PARALLEL_VACUUM_KEY_SHARED 1
> +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
> +#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
>
> I think it would be better if these keys should be assigned numbers in
> a way we do for other similar operation like create index. See below
> defines
> in code:
> /* Magic numbers for parallel state sharing */
> #define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001)
>
> This will make the code consistent with other parallel operations.

I skipped this comment according to the previous your mail.

>
> *
> +begin_parallel_vacuum(LVRelStats *vacrelstats, Oid relid, BlockNumber nblocks,
> + int nindexes, int nrequested)
> {
> ..
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> ..
> }
>
> I think here you should use SizeOfLVDeadTuples as defined by patch.

Fixed.

>
> *
> + keys++;
> +
> + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> + maxtuples = compute_max_dead_tuples(nblocks, true);
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> + mul_size(sizeof(ItemPointerData), maxtuples)));
> + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> + keys++;
> +
> + shm_toc_estimate_keys(&pcxt->estimator, keys);
> +
> + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> + querylen = strlen(debug_query_string);
> + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
>
> The code style looks inconsistent here. In some cases, you are
> calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> and in other cases, you are accumulating keys. I think it is better
> to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> in all cases.

Fixed. But there are some code that call shm_toc_estimate_keys for
multiple keys in for example nbtsort.c and parallel.c. What is the
difference?

>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> {
> ..
> + /* Set debug_query_string for individual workers */
> + sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
> ..
> }
>
> I think the last parameter in shm_toc_lookup should be false. Is
> there a reason for passing it as true?

My bad, fixed.

>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> +{
> ..
> + /* Open table */
> + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
> ..
> }
>
> I don't think it is a good idea to assume the lock mode as
> ShareUpdateExclusiveLock here. Tomorrow, if due to some reason there
> is a change in lock level for the vacuum process, we might forget to
> update it here. I think it is better if we can get this information
> from the master backend.

So did you mean to declare the lock mode for lazy vacuum somewhere as
a global variable and use it in both try_relation_open in the leader
process and relation_open in the worker process? Otherwise we would
end up with adding something like shared->lmode =
ShareUpdateExclusiveLock during parallel context initialization, which
seems not to resolve your concern.

>
> *
> +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
> {
> ..
> + /* Shutdown worker processes and destroy the parallel context */
> + WaitForParallelWorkersToFinish(lps->pcxt);
> ..
> }
>
> Do we really need to call WaitForParallelWorkersToFinish here as it
> must have been called in lazy_parallel_vacuum_or_cleanup_indexes
> before this time?

No, removed.

I've attached the updated version patch that incorporated your
comments excluding some comments that needs more discussion. After
discussion I'll update it again.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
v26-0001-Add-parallel-option-to-VACUUM-command.patch text/x-patch 54.4 KB
v26-0002-Add-paralell-P-option-to-vacuumdb-command.patch text/x-patch 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-02 14:11:52 Re: Is querying SPITupleTable with SQL possible?
Previous Message Nikita Glukhov 2019-10-02 13:10:18 Re: Fix parsing of identifiers in jsonpath