parallel vacuum options/syntax

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: parallel vacuum options/syntax
Date: 2020-01-02 12:08:50
Message-ID: CAA4eK1LBUfVQu7jCfL20MAF+RzUssP06mcBEcSZb8XktD7X1BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I am starting a new thread for some of the decisions for a parallel vacuum
in the hope to get feedback from more people. There are mainly two points
for which we need some feedback.

1. Tomas Vondra has pointed out on the main thread [1] that by default the
parallel vacuum should be enabled similar to what we do for Create Index.
As proposed, the patch enables it only when the user specifies it (ex.
Vacuum (Parallel 2) <tbl_name>;). One of the arguments in favor of
enabling it by default as mentioned by Tomas is "It's pretty much the same
thing we did with vacuum throttling - it's disabled for explicit vacuum by
default, but you can enable it. If you're worried about VACUUM causing
issues, you should set cost delay.". Some of the arguments against
enabling it are that it will lead to use of more resources (like CPU, I/O)
which users might or might like.

Now, if we want to enable it by default, we need a way to disable it as
well and along with that, we need a way for users to specify a parallel
degree. I have mentioned a few reasons why we need a parallel degree for
this operation in the email [2] on the main thread.

If parallel vacuum is **not** enabled by default, then I think the current
way to enable is fine which is as follows:
Vacuum (Parallel 2) <tbl_name>;

Here, if the user doesn't specify parallel_degree, then we internally
decide based on number of indexes that support a parallel vacuum with a
maximum of max_parallel_maintenance_workers.

If the parallel vacuum is enabled by default, then I could think of the
following ways:
(a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel
<parallel_degree>) <tbl_name>;
(b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies
parallel_degree as 0, then disable parallelism.
(c) ... Any better ideas?

2. The patch provides a FAST option (based on suggestion by Robert) for a
parallel vacuum which will make it behave like vacuum_cost_delay = 0 which
means it will disable throttling. So,
VACUUM (PARALLEL n, FAST) <tbl_name> will allow the parallel vacuum to run
without resource throttling. Tomas thinks that we don't need such an
option as the same can be served by setting vacuum_cost_delay = 0 which is
a valid argument, but OTOH, providing an option to the user which can make
his life easier is not a bad idea either.

Thoughts?

[1] -
https://www.postgresql.org/message-id/20191229212354.tqivttn23lxjg2jz%40development
[2] -
https://www.postgresql.org/message-id/CAA4eK1%2B1o-BaPvJnK7BPThTryx3MRDS%2BmCf9eVVZT%3DSVJ8mwLg%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-01-02 12:14:10 Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Previous Message Alvaro Herrera 2020-01-02 11:56:40 Re: TRUNCATE on foreign tables