Re: parallel vacuum options/syntax

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: parallel vacuum options/syntax
Date: 2020-01-05 10:26:35
Message-ID: CAA4eK1+Hnjj1sKd1hSzRFcOzxcxEAC48gX88hAbUMJ8amU1eYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 5, 2020 at 6:40 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
> >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> >wrote:
> >>
> >> Hi,
> >>
> >> I am starting a new thread for some of the decisions for a parallel
> >> vacuum in the hope to get feedback from more people. There are
> >> mainly two points for which we need some feedback.
> >>
> >> 1. Tomas Vondra has pointed out on the main thread [1] that by
> >> default the parallel vacuum should be enabled similar to what we do
> >> for Create Index. As proposed, the patch enables it only when the
> >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the
> >> arguments in favor of enabling it by default as mentioned by Tomas is
> >> "It's pretty much the same thing we did with vacuum throttling - it's
> >> disabled for explicit vacuum by default, but you can enable it. If
> >> you're worried about VACUUM causing issues, you should set cost
> >> delay.". Some of the arguments against enabling it are that it will
> >> lead to use of more resources (like CPU, I/O) which users might or
> >> might like.
> >>
> >
> >I'm a bit wary of making parallel vacuum enabled by default. Single
> >process vacuum does sequential reads/writes on most of indexes but
> >parallel vacuum does random access random reads/writes. I've tested
> >parallel vacuum on HDD and confirmed the performance is good but I'm
> >concerned that it might be cause of more disk I/O than user expected.
> >
>
> I understand the concern, but it's not clear to me why to apply this
> defensive approach just to vacuum and not to all commands. Especially
> when we do have a way to throttle vacuum (unlike pretty much any other
> command) if I/O really is a scarce resource.
>
> As the vacuum workers are separate processes, each generating requests
> with a sequential pattern, so I'd expect readahead to kick in and keep
> the efficiency of sequential access pattern.
>

Right, I also think so.

> >> Now, if we want to enable it by default, we need a way to disable it
> >> as well and along with that, we need a way for users to specify a
> >> parallel degree. I have mentioned a few reasons why we need a
> >> parallel degree for this operation in the email [2] on the main
> >> thread.
> >>
> >> If parallel vacuum is *not* enabled by default, then I think the
> >> current way to enable is fine which is as follows: Vacuum (Parallel
> >> 2) <tbl_name>;
> >>
> >> Here, if the user doesn't specify parallel_degree, then we internally
> >> decide based on number of indexes that support a parallel vacuum with
> >> a maximum of max_parallel_maintenance_workers.
> >>
> >> If the parallel vacuum is enabled by default, then I could think of
> >> the following ways:
> >>
> >> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel
> >> <parallel_degree>) <tbl_name>;
> >>
> >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user
> >> specifies parallel_degree as 0, then disable parallelism.
> >>
> >> (c) ... Any better ideas?
> >>
> >
> >If parallel vacuum is enabled by default, I would prefer (b) but I
> >don't think it's a good idea to accept 0 as parallel degree. If we want
> >to disable parallel vacuum we should max_parallel_maintenance_workers
> >to 0 instead.
> >
>
> IMO that just makes the interaction between vacuum options and the GUC
> even more complicated/confusing.
>

Yeah, I am also not sure if that will be a good idea.

> If we want to have a vacuum option to determine parallel degree, we
> should probably have a vacuum option to disable parallelism using just a
> vacuum option. I don't think 0 is too bad, and disable_parallel seems a
> bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
> That's what Oracle does, so it's not entirely without a precedent.
>

We can go either way (using 0 for parallel to indicate disable
parallelism or by introducing a new option like NOPARALLEL). I think
initially we can avoid introducing more options and just go with
'Parallel 0' and if we find a lot of people find it inconvenient, then
we can always introduce a new option later.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-01-05 11:48:59 Re: Patch to document base64 encoding
Previous Message Surafel Temesgen 2020-01-05 10:26:33 Re: WIP: System Versioned Temporal Table