Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-11-12 02:12:58
Message-ID: CA+fd4k4T2egAfj5TY9WsYrqDfPamLAM-p8o14WCwKJn8z5jkMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> >
> > On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> > > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > Good point. gin and bloom do a certain heavy work during cleanup and
> > > > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > > > hash and gist do it during bulkdelete. There are three types of index
> > > > AM just inside postgres code. An idea I came up with is that we can
> > > > control parallel vacuum and parallel cleanup separately. That is,
> > > > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > > > on only indexes of which amcanparallelcleanup is true.
> > > >
>
> This is what I mentioned in my email as a second option (whether to
> expose via IndexAM). I am not sure if we can have a new variable just
> for this.
>
> > > > IndexBulkDelete
> > > > can be stored locally if both amcanparallelvacuum and
> > > > amcanparallelcleanup of an index are false because only the leader
> > > > process deals with such indexes. Otherwise we need to store it in DSM
> > > > as always.
> > > >
> > > IIUC, amcanparallelcleanup will be true for those indexes which does
> > > heavy work during cleanup irrespective of whether bulkdelete is called
> > > or not e.g. gin?
> >
> > Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
> > might set amcanparallevacuum to true as well).
> >
> > > If so, along with an amcanparallelcleanup flag, we
> > > need to consider vacrelstats->num_index_scans right? So if
> > > vacrelstats->num_index_scans == 0 then we need to launch parallel
> > > worker for all the indexes who support amcanparallelvacuum and if
> > > vacrelstats->num_index_scans > 0 then only for those who has
> > > amcanparallelcleanup as true.
> >
> > Yes, you're right. But this won't work fine for brin indexes who don't
> > want to participate in parallel vacuum but always want to participate
> > in parallel cleanup.
> >
> > After more thoughts, I think we can have a ternary value: never,
> > always, once. If it's 'never' the index never participates in parallel
> > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > index always participates regardless of vacrelstats->num_index_scan. I
> > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > index participates in parallel cleanup only when it's the first time
> > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > spgist use 'once'.
> >
>
> I think this 'once' option is confusing especially because it also
> depends on 'num_index_scans' which the IndexAM has no control over.
> It might be that the option name is not good, but I am not sure.
> Another thing is that for brin indexes, we don't want bulkdelete to
> participate in parallelism.

I thought brin should set amcanparallelvacuum is false and
amcanparallelcleanup is 'always'.

> Do we want to have separate variables for
> ambulkdelete and amvacuumcleanup which decides whether the particular
> phase can be done in parallel?

You mean adding variables to ambulkdelete and amvacuumcleanup as
function arguments? If so isn't it too late to tell the leader whether
the particular pchase can be done in parallel?

> Another possibility could be to just
> have one variable (say uint16 amparallelvacuum) which will tell us all
> the options but I don't think that will be a popular approach
> considering all the other methods and variables exposed. What do you
> think?

Adding only one variable that can have flags would also be a good
idea, instead of having multiple variables for each option. For
instance FDW API uses such interface (see eflags of BeginForeignScan).

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-11-12 02:17:26 Re: Coding in WalSndWaitForWal
Previous Message Michael Paquier 2019-11-12 01:47:44 Re: Monitoring disk space from within the server