Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Date: 2021-02-01 15:28:57
Message-ID: 1c506564e2a732ec17790bd8c34f042e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-01-30 05:23, Michael Paquier wrote:
> On Fri, Jan 29, 2021 at 08:56:47PM +0300, Alexey Kondratov wrote:
>> On 2021-01-28 14:42, Alexey Kondratov wrote:
>>> No, you are right, we are doing REINDEX with AccessExclusiveLock as
>>> it
>>> was before. This part is a more specific one. It only applies to
>>> partitioned indexes, which do not hold any data, so we do not reindex
>>> them directly, only their leafs. However, if we are doing a
>>> TABLESPACE
>>> change, we have to record it in their pg_class entry, so all future
>>> leaf partitions were created in the proper tablespace.
>>>
>>> That way, we open partitioned index relation only for a reference,
>>> i.e. read-only, but modify pg_class entry under a proper lock
>>> (RowExclusiveLock). That's why I thought that ShareLock will be
>>> enough.
>>>
>>> IIUC, 'ALTER TABLE ... SET TABLESPACE' uses AccessExclusiveLock even
>>> for relations with no storage, since AlterTableGetLockLevel() chooses
>>> it if AT_SetTableSpace is met. This is very similar to our case, so
>>> probably we should do the same?
>>>
>>> Actually it is not completely clear for me why
>>> ShareUpdateExclusiveLock is sufficient for newly added
>>> SetRelationTableSpace() as Michael wrote in the comment.
>
> Nay, it was not fine. That's something Alvaro has mentioned, leading
> to 2484329. This also means that the main patch of this thread should
> refresh the comments at the top of CheckRelationTableSpaceMove() and
> SetRelationTableSpace() to mention that this is used by REINDEX
> CONCURRENTLY with a lower lock.
>

Hm, IIUC, REINDEX CONCURRENTLY doesn't use either of them. It directly
uses index_create() with a proper tablespaceOid instead of
SetRelationTableSpace(). And its checks structure is more restrictive
even without tablespace change, so it doesn't use
CheckRelationTableSpaceMove().

>> Changed patch to use AccessExclusiveLock in this part for now. This is
>> what
>> 'ALTER TABLE/INDEX ... SET TABLESPACE' and 'REINDEX' usually do.
>> Anyway, all
>> real leaf partitions are processed in the independent transactions
>> later.
>
> + if (partkind == RELKIND_PARTITIONED_INDEX)
> + {
> + Relation iRel = index_open(partoid, AccessExclusiveLock);
> +
> + if (CheckRelationTableSpaceMove(iRel,
> params->tablespaceOid))
> + SetRelationTableSpace(iRel,
> + params->tablespaceOid,
> + InvalidOid);
> + index_close(iRel, NoLock);
> Are you sure that this does not represent a risk of deadlocks as EAL
> is not taken consistently across all the partitions? A second issue
> here is that this breaks the assumption of REINDEX CONCURRENTLY kicked
> on partitioned relations that should use ShareUpdateExclusiveLock for
> all its steps. This would make the first transaction invasive for the
> user, but we don't want that.
>
> This makes me really wonder if we would not be better to restrict this
> operation for partitioned relation as part of REINDEX as a first step.
> Another thing, mentioned upthread, is that we could do this part of
> the switch at the last transaction, or we could silently *not* do the
> switch for partitioned indexes in the flow of REINDEX, letting users
> handle that with an extra ALTER TABLE SET TABLESPACE once REINDEX has
> finished on all the partitions, cascading the command only on the
> partitioned relation of a tree. It may be interesting to look as well
> at if we could lower the lock used for partitioned relations with
> ALTER TABLE SET TABLESPACE from AEL to SUEL, choosing AEL only if at
> least one partition with storage is involved in the command,
> CheckRelationTableSpaceMove() discarding anything that has no need to
> change.
>

I am not sure right now, so I split previous patch into two parts:

0001: Adds TABLESPACE into REINDEX with tests, doc and all the stuff we
did before with the only exception that it doesn't move partitioned
indexes into the new tablespace.

Basically, it implements this option "we could silently *not* do the
switch for partitioned indexes in the flow of REINDEX, letting users
handle that with an extra ALTER TABLE SET TABLESPACE once REINDEX has
finished". It probably makes sense, since we are doing tablespace change
altogether with index relation rewrite and don't touch relations without
storage. Doing ALTER INDEX ... SET TABLESPACE will be almost cost-less
on them, since they do not hold any data.

0002: Implements the remaining part where pg_class entry is also changed
for partitioned indexes. I think that we should think more about it,
maybe it is not so dangerous and proper locking strategy could be
achieved.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
v10-0002-Change-tablespace-of-partitioned-indexes-during-.patch text/x-diff 8.6 KB
v10-0001-Allow-REINDEX-to-change-tablespace.patch text/x-diff 30.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-02-01 15:47:14 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Julien Rouhaud 2021-02-01 15:16:17 Re: Commitfest 2021-01 is now closed.