Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Date: 2019-12-02 09:41:03
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.12.2019 11:21, Michael Paquier wrote:
> On Wed, Nov 27, 2019 at 08:47:06PM +0300, Alexey Kondratov wrote:
>> The only difference is that point 3) and tablespace part of 5) were missing
>> in RelationSetNewRelfilenode, so I added them, and I do 4) after 6) in
>> REINDEX. Thus, it seems that in my implementation of tablespace change in
>> REINDEX I am more sure that "the relation tablespace is correctly updated
>> before reindexing", since I do reindex after CCI (point 6), doesn't it?
>> So why it is fine for ATExecSetTableSpace to do pretty much the same, but
>> not for REINDEX? Or the key point is in doing actual work before CCI, but
>> for me it seems a bit against what you have wrote?
> Nope, the order is not the same on what you do here, causing a
> duplication in the tablespace selection within
> RelationSetNewRelfilenode() and when flushing the relation on the new
> tablespace for the first time after the CCI happens, please see
> below. And we should avoid that.
>> Thus, I cannot get your point correctly here. Can you, please, elaborate a
>> little bit more your concerns?
> The case of REINDEX CONCURRENTLY is pretty simple, because a new
> relation which is a copy of the old relation is created before doing
> the reindex, so you simply need to set the tablespace OID correctly
> in index_concurrently_create_copy(). And actually, I think that the
> computation is incorrect because we need to check after
> MyDatabaseTableSpace as well, no?

No, the same logic already exists in heap_create:

    if (reltablespace == MyDatabaseTableSpace)
        reltablespace = InvalidOid;

Which is called by index_concurrently_create_copy -> index_create ->

> The case of REINDEX is more tricky, because you are working on a
> relation that already exists, hence I think that what you need to do a
> different thing before the actual REINDEX:
> 1) Update the existing relation's pg_class tuple to point to the new
> tablespace.
> 2) Do a CommandCounterIncrement.
> So I think that the order of the operations you are doing is incorrect,
> and that you have a risk of breaking the existing tablespace assignment
> logic done when first flushing a new relfilenode.
> This actually brings an extra thing: when doing a plain REINDEX you
> need to make sure that the past relfilenode of the relation gets away
> properly. The attached POC patch does that before doing the CCI which
> is a bit ugly, but that's enough to show my point, and there is no
> need to touch RelationSetNewRelfilenode() this way.

Thank you for the detailed answer and PoC patch. I will recheck
everything and dig deeper into this problem, and come up with something
closer to the next 01.2020 commitfest.


Alexey Kondratov

Postgres Professional
Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message yuzuko 2019-12-02 09:42:22 Re: Autovacuum on partitioned table
Previous Message yuzuko 2019-12-02 09:25:37 Re: Partitioning versus autovacuum