Re: [HACKERS] Block level parallel vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Claudio Freire <klaussfreire(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2020-01-17 05:13:49
Message-ID: CAA4eK1+UnM=MMfepqjXkqbz_TxxDPGYT7sacP2vcU6Bh=JWnnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> I have few small comments.
>
> 1.
> logical streaming for large in-progress transactions+
> + /* Can't perform vacuum in parallel */
> + if (parallel_workers <= 0)
> + {
> + pfree(can_parallel_vacuum);
> + return lps;
> + }
>
> why are we checking parallel_workers <= 0, Function
> compute_parallel_vacuum_workers only returns 0 or greater than 0
> so isn't it better to just check if (parallel_workers == 0) ?
>

Why to have such an assumption about
compute_parallel_vacuum_workers()? The function
compute_parallel_vacuum_workers() returns int, so such a check
(<= 0) seems reasonable to me.

> 2.
> +/*
> + * Macro to check if we are in a parallel vacuum. If true, we are in the
> + * parallel mode and the DSM segment is initialized.
> + */
> +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL)
>
> (LVParallelState *) (lps) -> this typecast is not required, just (lps)
> != NULL should be enough.
>

I think the better idea would be to just replace it PointerIsValid
like below. I see similar usage in other places.
#define ParallelVacuumIsActive(lps) PointerIsValid(lps)

> 3.
>
> + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
> + prepare_index_statistics(shared, can_parallel_vacuum, nindexes);
> + pg_atomic_init_u32(&(shared->idx), 0);
> + pg_atomic_init_u32(&(shared->cost_balance), 0);
> + pg_atomic_init_u32(&(shared->active_nworkers), 0);
>
> I think it will look cleaner if we can initialize in the order they
> are declared in structure.
>

Okay.

> 4.
> + VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
> + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
> +
> + /*
> + * Set up shared cost balance and the number of active workers for
> + * vacuum delay.
> + */
> + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
> + pg_atomic_write_u32(VacuumActiveNWorkers, 0);
> +
> + /*
> + * The number of workers can vary between bulkdelete and cleanup
> + * phase.
> + */
> + ReinitializeParallelWorkers(lps->pcxt, nworkers);
> +
> + LaunchParallelWorkers(lps->pcxt);
> +
> + if (lps->pcxt->nworkers_launched > 0)
> + {
> + /*
> + * Reset the local cost values for leader backend as we have
> + * already accumulated the remaining balance of heap.
> + */
> + VacuumCostBalance = 0;
> + VacuumCostBalanceLocal = 0;
> + }
> + else
> + {
> + /*
> + * Disable shared cost balance if we are not able to launch
> + * workers.
> + */
> + VacuumSharedCostBalance = NULL;
> + VacuumActiveNWorkers = NULL;
> + }
> +
>
> I don't like the idea of first initializing the
> VacuumSharedCostBalance with lps->lvshared->cost_balance and then
> uninitialize if nworkers_launched is 0.
> I am not sure why do we need to initialize VacuumSharedCostBalance
> here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
> VacuumCostBalance);?
> I think we can initialize it only if nworkers_launched > 0 then we can
> get rid of the else branch completely.
>

No, we can't initialize after nworkers_launched > 0 because by that
time some workers would have already tried to access the shared cost
balance. So, it needs to be done before launching the workers as is
done in code. We can probably add a comment.

>
> + /* Carry the shared balance value to heap scan */
> + if (VacuumSharedCostBalance)
> + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
> +
> + if (nworkers > 0)
> + {
> + /* Disable shared cost balance */
> + VacuumSharedCostBalance = NULL;
> + VacuumActiveNWorkers = NULL;
> + }
>
> Doesn't make sense to keep them as two conditions, we can combine them as below
>
> /* If shared costing is enable, carry the shared balance value to heap
> scan and disable the shared costing */
> if (VacuumSharedCostBalance)
> {
> VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
> VacuumSharedCostBalance = NULL;
> VacuumActiveNWorkers = NULL;
> }
>

makes sense to me, will change.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-01-17 05:30:04 Re: [HACKERS] Block level parallel vacuum
Previous Message Michael Paquier 2020-01-17 04:47:01 Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node