Re: [HACKERS] Block level parallel vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Claudio Freire <klaussfreire(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2020-01-13 03:50:27
Message-ID: CAA4eK1LotP=oge4GjSjZtxC0EWknOAkuSDmpn_3pw=KNUDn4mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk(at)zsrv(dot)org> wrote:
> > > > >
> > > > > Hi
> > > > > Thank you for update! I looked again
> > > > >
> > > > > (vacuum_indexes_leader)
> > > > > + /* Skip the indexes that can be processed by parallel workers */
> > > > > + if (!skip_index)
> > > > > + continue;
> > > > >
> > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > >
> > > > I also agree with your point.
> > >
> > > I don't think the change is a good idea.
> > >
> > > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> > > - skip_parallel_vacuum_index(Irel[i], lps->lvshared));
> > > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> > > + skip_parallel_vacuum_index(Irel[i],
> > > + lps->lvshared));
> > >
> > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_index and change the comment to something like “We are interested in only index skipped parallel vacuum”?
> > >
> >
> > Hmm, I find the current code and comment better than what you or
> > Sergei are proposing. I am not sure what is the point of confusion in
> > the current code?
>
> Yeah the current code is also good. I just thought they were concerned
> that the variable name skip_index might be confusing because we skip
> if skip_index is NOT true.
>

Okay, would it better if we get rid of this variable and have code like below?

/* Skip the indexes that can be processed by parallel workers */
if ( !(get_indstats(lps->lvshared, i) == NULL ||
skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
continue;
...

> >
> > > >
> > > > >
> > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit:
> > > > >
> > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > > + {
> > > > > + ereport(WARNING,
> > > > > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > > > > + RelationGetRelationName(onerel))));
> > > > > + params->nworkers = -1;
> > > > > + }
> > > > >
> > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> > > >
> > > > Good point.
> > > > Yes, we should improve this. I tried to fix this.
> > >
> > > +1
> > >
> >
> > Yeah, we can improve the situation here. I think we don't need to
> > change the value of params->nworkers at first place if allow
> > lazy_scan_heap to take care of this. Also, I think we shouldn't
> > display warning unless the user has explicitly asked for parallel
> > option. See the fix in the attached patch.
>
> Agreed. But with the updated patch the PARALLEL option without the
> parallel degree doesn't display warning because params->nworkers = 0
> in that case. So how about restoring params->nworkers at the end of
> vacuum_rel()?
>

I had also thought on those lines, but I was not entirely sure about
this resetting of workers. Today, again thinking about it, it seems
the idea Mahendra is suggesting that is giving an error if the
parallel degree is not specified seems reasonable to me. This means
Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an
error "parallel degree must be specified". This idea has merit as now
we are supporting a parallel vacuum by default, so a 'parallel' option
without a parallel degree doesn't have any meaning. If we do that,
then we don't need to do anything additional about the handling of
temp tables (other than what patch is already doing) as well. What do
you think?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-01-13 06:38:26 Re: [HACKERS] Block level parallel vacuum
Previous Message Dilip Kumar 2020-01-13 03:25:57 Re: Questions/Observations related to Gist vacuum