Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-11-13 06:08:27
Message-ID: CA+fd4k4UQ1cp4G5rhzdvU6=dcMYrSg2VVekkxh5K9Unudp4sXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > updated version patch that also incorporated some comments I got so
> > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > test the total delay time.
> > > > > >
> > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > dividing the maintainance_work_mem
> > > > >
> > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > +{
> > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > + lvshared->maintenance_work_mem_worker =
> > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > maintenance_work_mem;
> > > > > +}
> > > > > Is it fair to just consider the number of indexes which use
> > > > > maintenance_work_mem? Or we need to consider the number of worker as
> > > > > well. My point is suppose there are 10 indexes which will use the
> > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > the point in dividing the maintenance_work_mem by 10.
> > > > >
> > > > > IMHO the calculation should be like this
> > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) :
> > > > > maintenance_work_mem;
> > > > >
> > > > > Am I missing something?
> > > >
> > > > No, I think you're right. On the other hand I think that dividing it
> > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > parallel at a time.
> > > >
> > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
> >
> > Thanks! I'll fix it in the next version patch.
> >
> One more comment.
>
> +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps)
> +{
> + ....
>
> + if (ParallelVacuumIsActive(lps))
> + {
>
> +
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
> +
> + }
> +
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + /*
> + * Skip indexes that we have already vacuumed during parallel index
> + * vacuuming.
> + */
> + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> + continue;
> +
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> + vacrelstats->old_live_tuples);
> + }
> +}
>
> In this function, if ParallelVacuumIsActive, we perform the parallel
> vacuum for all the index for which parallel vacuum is supported and
> once that is over we finish vacuuming remaining indexes for which
> parallel vacuum is not supported. But, my question is that inside
> lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> to finish their job then only we start with the sequential vacuuming
> shouldn't we start that immediately as soon as the leader
> participation is over in the parallel vacuum?

If we do that, while the leader process is vacuuming indexes that
don't not support parallel vacuum sequentially some workers might be
vacuuming for other indexes. Isn't it a problem? If it's not problem,
I think we can tie up indexes that don't support parallel vacuum to
the leader and do parallel index vacuum.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-11-13 06:09:15 Re: [HACKERS] Block level parallel vacuum
Previous Message Pavel Stehule 2019-11-13 05:58:11 Re: Invisible PROMPT2