Re: parallel vacuum options/syntax

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: parallel vacuum options/syntax
Date: 2020-01-05 01:10:34
Message-ID: 20200105011034.ufg6oa2ziyebaild@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
>On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>wrote:
>>
>> Hi,
>>
>> I am starting a new thread for some of the decisions for a parallel
>> vacuum in the hope to get feedback from more people. There are
>> mainly two points for which we need some feedback.
>>
>> 1. Tomas Vondra has pointed out on the main thread [1] that by
>> default the parallel vacuum should be enabled similar to what we do
>> for Create Index. As proposed, the patch enables it only when the
>> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the
>> arguments in favor of enabling it by default as mentioned by Tomas is
>> "It's pretty much the same thing we did with vacuum throttling - it's
>> disabled for explicit vacuum by default, but you can enable it. If
>> you're worried about VACUUM causing issues, you should set cost
>> delay.". Some of the arguments against enabling it are that it will
>> lead to use of more resources (like CPU, I/O) which users might or
>> might like.
>>
>
>I'm a bit wary of making parallel vacuum enabled by default. Single
>process vacuum does sequential reads/writes on most of indexes but
>parallel vacuum does random access random reads/writes. I've tested
>parallel vacuum on HDD and confirmed the performance is good but I'm
>concerned that it might be cause of more disk I/O than user expected.
>

I understand the concern, but it's not clear to me why to apply this
defensive approach just to vacuum and not to all commands. Especially
when we do have a way to throttle vacuum (unlike pretty much any other
command) if I/O really is a scarce resource.

As the vacuum workers are separate processes, each generating requests
with a sequential pattern, so I'd expect readahead to kick in and keep
the efficiency of sequential access pattern.

>> Now, if we want to enable it by default, we need a way to disable it
>> as well and along with that, we need a way for users to specify a
>> parallel degree. I have mentioned a few reasons why we need a
>> parallel degree for this operation in the email [2] on the main
>> thread.
>>
>> If parallel vacuum is *not* enabled by default, then I think the
>> current way to enable is fine which is as follows: Vacuum (Parallel
>> 2) <tbl_name>;
>>
>> Here, if the user doesn't specify parallel_degree, then we internally
>> decide based on number of indexes that support a parallel vacuum with
>> a maximum of max_parallel_maintenance_workers.
>>
>> If the parallel vacuum is enabled by default, then I could think of
>> the following ways:
>>
>> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel
>> <parallel_degree>) <tbl_name>;
>>
>> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user
>> specifies parallel_degree as 0, then disable parallelism.
>>
>> (c) ... Any better ideas?
>>
>
>If parallel vacuum is enabled by default, I would prefer (b) but I
>don't think it's a good idea to accept 0 as parallel degree. If we want
>to disable parallel vacuum we should max_parallel_maintenance_workers
>to 0 instead.
>

IMO that just makes the interaction between vacuum options and the GUC
even more complicated/confusing.

If we want to have a vacuum option to determine parallel degree, we
should probably have a vacuum option to disable parallelism using just a
vacuum option. I don't think 0 is too bad, and disable_parallel seems a
bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
That's what Oracle does, so it's not entirely without a precedent.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2020-01-05 01:19:16 Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Previous Message Peter Geoghegan 2020-01-05 00:42:56 Re: pgsql: Add basic TAP tests for psql's tab-completion logic.