Quick Links

Re: [HACKERS] Block level parallel vacuum

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc:	Sergei Kornilov <sk(at)zsrv(dot)org>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Block level parallel vacuum
Date:	2019-12-03 10:55:10
Message-ID:	CAA4eK1+M7-LJmg_YuTrza_1MXa-wDEQbDSUT_GjQaPVDQN51Rg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 3, 2019 at 12:56 AM Masahiko Sawada <
masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:

> On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk(at)zsrv(dot)org> wrote:
> >
> > Hi
> >
> > > I think I got your point. Your proposal is that it's more efficient if
> > > we make the leader process vacuum the index that can be processed only
> > > the leader process (i.e. indexes not supporting parallel index vacuum)
> > > while workers are processing indexes supporting parallel index vacuum,
> > > right? That way, we can process indexes in parallel as much as
> > > possible.
> >
> > Right
> >
> > > So maybe we can call vacuum_or_cleanup_skipped_indexes first
> > > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> > > there are parallel-safe remaining indexes after the leader finished
> > > vacuum_or_cleanup_indexes_worker, as described on your proposal.
> >
> > I meant that after processing missing indexes (not supporting parallel
> index vacuum), the leader can start processing indexes that support the
> parallel index vacuum, along with parallel workers.
> > Exactly call vacuum_or_cleanup_skipped_indexes after start parallel
> workers but before vacuum_or_cleanup_indexes_worker or something with
> similar effect.
> > If we have 0 missed indexes - parallel vacuum will run as in current
> implementation, with leader participation.
>
> I think your idea might not work well in some cases.

Good point. I am also not sure whether it is a good idea to make the
suggested change, but I think adding a comment on those lines is not a bad
idea which I have done in the attached patch.

I have made some other changes as well.
1.
+ if (VacuumSharedCostBalance != NULL)
{
- double msec;
+ int nworkers = pg_atomic_read_u32
(VacuumActiveNWorkers);
+
+ /* At least count itself */
+ Assert(nworkers >= 1);
+
+ /* Update the shared cost
balance value atomically */
+ while (true)
+ {
+ uint32 shared_balance;
+ uint32 new_balance;
+
uint32 local_balance;
+
+ msec = 0;
+
+ /* compute new balance by adding the local value */
+
shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ new_balance = shared_balance + VacuumCostBalance;
+
/* also compute the total local balance */
+ local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
+
+
if ((new_balance >= VacuumCostLimit) &&
+ (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
+ {
+
/* compute sleep time based on the local cost balance */
+ msec = VacuumCostDelay *
VacuumCostBalanceLocal / VacuumCostLimit;
+ new_balance = shared_balance - VacuumCostBalanceLocal;
+
VacuumCostBalanceLocal = 0;
+ }
+
+ if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
+
&shared_balance,
+
new_balance))
+ {
+ /* Updated successfully, break */
+
break;
+ }
+ }
+
+ VacuumCostBalanceLocal += VacuumCostBalance;

I see multiple problems with this code. (a) if the VacuumSharedCostBalance
is changed by the time of compare and exchange, then the next iteration
might not compute the correct values as you might have reset
VacuumCostBalanceLocal by that time. (b) In code line, new_balance =
shared_balance - VacuumCostBalanceLocal, you need to use new_balance
instead of shared_balance, otherwise, it won't account for the balance of
the latest cycle. (c) In code line, msec = VacuumCostDelay *
VacuumCostBalanceLocal / VacuumCostLimit;, I think you need to use
local_balance for reasons similar to (b). (d) I think we can write this
code with a lesser number of variables.

I have fixed all these problems and used a slightly different way to
compute the parallel delay. See compute_parallel_delay() in the attached
delta patch.

2.
+ /* Setup the shared cost-based vacuum delay and launch workers*/
+ if (nworkers > 0)
+ {
+ /*
+ * Reset the local value so that we compute cost balance during
+ * parallel index vacuuming.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ LaunchParallelWorkers(lps->pcxt, nworkers);
+
+ /* Enable shared costing iff we process indexes in parallel. */
+ if (lps->pcxt->nworkers_launched > 0)
+ {
+ /* Enable shared cost balance */
+ VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
+ VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay.
+ */
+ pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
+ pg_atomic_write_u32(VacuumActiveNWorkers, 0);

This code has issues. We can't
initialize VacuumSharedCostBalance/VacuumActiveNWorkers after launching
workers as by that time some other worker would have changed its value.
This has been reported offlist by Mahendra and I have fixed it.

3. Changed the name of functions which were too long and I think new names
are more meaningful. If you don't agree with these changes, then we can
discuss it.

4. Changed the order of parameters in many functions to match with existing
code.

5. Refactored the code at a few places so that it can be easy to follow.

6. Added/Edited many comments and other cosmetic changes.

You can find all these changes in v35-0003-Code-review-amit.patch.

Few other things, I would like you to consider.
1. I think disable_parallel_leader_participation related code can be
extracted into a separate patch as it is mainly a debug/test aid. You can
also fix the problem reported by Mahendra in that context.

2. I think if we cam somehow disallow very small indexes to use parallel
workers, then it will be better. Can we use min_parallel_index_scan_size
to decide whether a particular index can participate in a parallel vacuum?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v35-0003-Add-parallel-option-to-VACUUM-command.patch	application/octet-stream	75.9 KB
v35-0003-Code-review-amit.patch	application/octet-stream	39.0 KB

In response to

Re: [HACKERS] Block level parallel vacuum at 2019-12-02 19:26:13 from Masahiko Sawada

Responses

Re: [HACKERS] Block level parallel vacuum at 2019-12-03 10:56:49 from Amit Kapila
Re: [HACKERS] Block level parallel vacuum at 2019-12-03 20:28:09 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2019-12-03 10:56:49	Re: [HACKERS] Block level parallel vacuum
Previous Message	Michael Paquier	2019-12-03 10:16:25	Re: fe-utils - share query cancellation code