Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Date: 2021-01-30 02:23:01
Message-ID: YBTC0IYIGmyYfiIn@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 29, 2021 at 08:56:47PM +0300, Alexey Kondratov wrote:
> On 2021-01-28 14:42, Alexey Kondratov wrote:
>> No, you are right, we are doing REINDEX with AccessExclusiveLock as it
>> was before. This part is a more specific one. It only applies to
>> partitioned indexes, which do not hold any data, so we do not reindex
>> them directly, only their leafs. However, if we are doing a TABLESPACE
>> change, we have to record it in their pg_class entry, so all future
>> leaf partitions were created in the proper tablespace.
>>
>> That way, we open partitioned index relation only for a reference,
>> i.e. read-only, but modify pg_class entry under a proper lock
>> (RowExclusiveLock). That's why I thought that ShareLock will be
>> enough.
>>
>> IIUC, 'ALTER TABLE ... SET TABLESPACE' uses AccessExclusiveLock even
>> for relations with no storage, since AlterTableGetLockLevel() chooses
>> it if AT_SetTableSpace is met. This is very similar to our case, so
>> probably we should do the same?
>>
>> Actually it is not completely clear for me why
>> ShareUpdateExclusiveLock is sufficient for newly added
>> SetRelationTableSpace() as Michael wrote in the comment.

Nay, it was not fine. That's something Alvaro has mentioned, leading
to 2484329. This also means that the main patch of this thread should
refresh the comments at the top of CheckRelationTableSpaceMove() and
SetRelationTableSpace() to mention that this is used by REINDEX
CONCURRENTLY with a lower lock.

> Changed patch to use AccessExclusiveLock in this part for now. This is what
> 'ALTER TABLE/INDEX ... SET TABLESPACE' and 'REINDEX' usually do. Anyway, all
> real leaf partitions are processed in the independent transactions later.

+ if (partkind == RELKIND_PARTITIONED_INDEX)
+ {
+ Relation iRel = index_open(partoid, AccessExclusiveLock);
+
+ if (CheckRelationTableSpaceMove(iRel, params->tablespaceOid))
+ SetRelationTableSpace(iRel,
+ params->tablespaceOid,
+ InvalidOid);
+ index_close(iRel, NoLock);
Are you sure that this does not represent a risk of deadlocks as EAL
is not taken consistently across all the partitions? A second issue
here is that this breaks the assumption of REINDEX CONCURRENTLY kicked
on partitioned relations that should use ShareUpdateExclusiveLock for
all its steps. This would make the first transaction invasive for the
user, but we don't want that.

This makes me really wonder if we would not be better to restrict this
operation for partitioned relation as part of REINDEX as a first step.
Another thing, mentioned upthread, is that we could do this part of
the switch at the last transaction, or we could silently *not* do the
switch for partitioned indexes in the flow of REINDEX, letting users
handle that with an extra ALTER TABLE SET TABLESPACE once REINDEX has
finished on all the partitions, cascading the command only on the
partitioned relation of a tree. It may be interesting to look as well
at if we could lower the lock used for partitioned relations with
ALTER TABLE SET TABLESPACE from AEL to SUEL, choosing AEL only if at
least one partition with storage is involved in the command,
CheckRelationTableSpaceMove() discarding anything that has no need to
change.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-01-30 02:30:11 Re: LogwrtResult contended spinlock
Previous Message Peter Geoghegan 2021-01-30 02:03:58 Re: Thoughts on "killed tuples" index hint bits support on standby