Re: [HACKERS] Block level parallel vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-12-30 05:10:39
Message-ID: CAA4eK1KOCFaoDYKpNnZzbhRN1W7ZR+1kpJrO3rt+EQMMW5ybbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote:
> >>
> >> v40-0003-Add-FAST-option-to-vacuum-command.patch
> >> ------------------------------------------------
> >>
> >> I do have a bit of an issue with this part - I'm not quite convinved we
> >> actually need a FAST option, and I actually suspect we'll come to regret
> >> it sooner than later. AFAIK it pretty much does exactly the same thing
> >> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
> >> multiple ways to do the same thing - I do expect reports from confused
> >> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
> >> sufficient solution?
> >
> >I think the motivation of this option is similar to FREEZE. I think
> >it's sometimes a good idea to have a shortcut of popular usage and
> >make it have an name corresponding to its job. From that perspective I
> >think having FAST option would make sense but maybe we need more
> >discussion the combination parallel vacuum and vacuum delay.
> >
>
> OK. I think it's mostly independent piece, so maybe we should move it to
> a separate patch. It's more likely to get attention/feedback when not
> buried in this thread.
>

+1. It is already a separate patch and I think we can even discuss
more on it in a new thread once the main patch is committed or do you
think we should have a conclusion about it now itself? To me, this
option appears to be an extension to the main feature which can be
useful for some users and people might like to have a separate option,
so we can discuss it and get broader feedback after the main patch is
committed.

> >>
> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
> >> we need a separate VACUUM option, instead of just using the existing
> >> max_parallel_maintenance_workers GUC?
> >>

How will user specify parallel degree? The parallel degree is helpful
because in some cases users can decide how many workers should be
launched based on size and type of indexes.

> >> It's good enough for CREATE INDEX
> >> so why not here?
> >

That is a different feature and I think here users can make a better
judgment based on the size of indexes. Moreover, users have an option
to control a parallel degree for 'Create Index' via Alter Table
<tbl_name> Set (parallel_workers = <n>) which I am not sure is a good
idea for parallel vacuum as the parallelism is more derived from size
and type of indexes. Now, we can think of a similar parameter at the
table/index level for parallel vacuum, but I don't see it equally
useful in this case.

> >AFAIR There was no such discussion so far but I think one reason could
> >be that parallel vacuum should be disabled by default. If the parallel
> >vacuum uses max_parallel_maintenance_workers (2 by default) rather
> >than having the option the parallel vacuum would work with default
> >setting but I think that it would become a big impact for user because
> >the disk access could become random reads and writes when some indexes
> >are on the same tablespace.
> >
>
> I'm not quite convinced VACUUM should have parallelism disabled by
> default. I know some people argued we should do that because making
> vacuum faster may put pressure on other parts of the system. Which is
> true, but I don't think the solution is to make vacuum slower by
> default. IMHO we should do the opposite - have it parallel by default
> (as driven by max_parallel_maintenance_workers), and have an option
> to disable parallelism.
>

I think driving parallelism for vacuum by
max_parallel_maintenance_workers might not be sufficient. We need to
give finer control as it depends a lot on the size of indexes. Also,
unlike Create Index, Vacuum can be performed on an entire database and
it is quite possible that some tables/indexes are relatively smaller
and forcing parallelism on them by default might slow down the
operation.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-12-30 05:47:08 Re: Reorderbuffer crash during recovery
Previous Message Melanie Plageman 2019-12-30 03:34:02 Re: Avoiding hash join batch explosions with extreme skew and weird stats