Re: Add SKIP LOCKED to VACUUM and ANALYZE

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Lyes Ameddah <lyes(dot)amd(at)gmail(dot)com>
Subject: Re: Add SKIP LOCKED to VACUUM and ANALYZE
Date: 2018-07-17 06:21:48
Message-ID: 20180717062148.GE3388@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 12, 2018 at 02:37:28PM +0900, Michael Paquier wrote:
> On Wed, Jun 13, 2018 at 08:29:12PM +0000, Bossart, Nathan wrote:
>> Previous thread: https://postgr.es/m/4BC0F3CD-F4B5-4F23-AADB-80607F9E4B4E%40amazon.com
>>
>> This is a new thread for tracking the work to add SKIP LOCKED to
>> VACUUM and ANALYZE. I've attached a rebased set of patches.
>
> I am beginning to look at this stuff, and committed patch 4 and 5 on the
> way as they make sense independently. I am still reviewing the rest,
> which applies with some fuzz, and the tests proposed still pass.

I have been looking at the rest of the patch set, and I find that the
way SKIP_LOCKED is handled is a bit inconsistent. First, if the manual
VACUUM does not specify a list of relations, which can never happen for
autovacuum, then we get to use get_all_vacuum_rels to decide the list of
relations to look at, and then it is up to vacuum_rel() to decide if a
relation can be skipped or not. If a list of relation is given by the
caller, then it is up to expand_vacuum_rel() to do the call.

In expand_vacuum_rel(), an OID could be defined for a relation, which is
something used by autovacuum. If no OID is defined, which happens for a
manual VACUUM, then we use directly RangeVarGetRelidExtended() at this
stage and see if the relation is available. If the relation listed in
the manual VACUUM is a partitioned table, then its full set of
partition OIDs (including down layers), are included in the list of
relations to include. And we definitely want to hold, then release once
AccessShareLock so as the partition tree lookup can happen. This uses
as well find_all_inheritors() with NoLock so as we rely on vacuum_rel()
to skip the relation or not.

The first thing which is striking me is that we may actually *not* want
to check for lock skipping within expand_vacuum_rel() as that's mainly a
function aimed at building the relations which are going to be vacuumed,
and that all the lock skipping operations should happen at the beginning
of vacuum_rel(). This way, even if the parent of a partition tree is
listed but cannot have its RangeVar opened immediately, then we have a
chance to have some of its partitions to be vacuumed after listing them.
This point is debatable as there are pros and cons. I would be of the
opinion to not change expand_vacuum_rel()'s mission to build the list of
relations to VACUUM, and actually complain about a lock not available
when the relation is ready to be processed individually. It is also
perfectly possible to skip locks for partitions by listing them
individually in the VACUUM command used. I am pretty sure that Andres
would complain at the sight of this paragraph as this is backwards with
the reason behind the refactoring of RangeVarGetRelidExtended but I like
the new API better :p

For this part, it seems to me that we can do better than what is in
HEAD:
- Call RangeVarGetRelidExtended without lock.
- Check for has_subclass(), which should be extended so as it does not
complain because of a missing relation so as vacuum.c does the error
handling.
- Call RangeVarGetRelidExtended a second time if a lookup with
find_all_inheritors is necessary. If the relation goes missing
in-between then we just get an error as the current behavior.

I am also questioning the fact of complicating vac_open_indexes() by
skipping a full relation vacuum if one of its indexes cannot be opened,
which would be the case for an ALTER INDEX for example. It seems to me
that skipping locks should just happen for the parent relation, so I
would not bother much about this case, and just document the behavior.
If somebody argues for this option, we could always have a SKIP_INDEXES
or such.

"SKIP LOCKED" should be named "SKIP_LOCKED", with an underscore for
consistency with the other options.

-void
-cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose)
+bool
+cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
-bool skip_locked)
{
Some refactoring of CLUSTER on the way here? It would be nice to avoid
having three boolean arguments here, and opens the door for an extended
CLUSTER grammar with parenthesis as well as a SKIP_LOCKED option for
it. And this could be refactored first. VACUUM FULL being a variant of
CLUSTER, we could just reuse the same options... Now using separate
option names could be cleaner.

The stuff of get_elevel_for_skipped_relation could be refactored into
something used as well by cluster_rel as the same logic is used in three
places (vacuum_rel, analyze_rel and cluster_rel with try_relation_open).

I think that it would be most interesting to get the refactoring around
get_elevel_for_skip_locked and cluster_rel done first. The improvement
for RangeVarGetRelidExtended with relations not having subclasses could
be worth done separately as well.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kaye Ann Ignacio 2018-07-17 06:43:03 foreign key to foreign table
Previous Message Kato, Sho 2018-07-17 05:44:41 RE: How to make partitioning scale better for larger numbers of partitions