Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Date: 2019-12-02 09:41:03
Message-ID: e625699b-bb82-a28b-370b-2da4bbdb72ed@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02.12.2019 11:21, Michael Paquier wrote:
> On Wed, Nov 27, 2019 at 08:47:06PM +0300, Alexey Kondratov wrote:
>> The only difference is that point 3) and tablespace part of 5) were missing
>> in RelationSetNewRelfilenode, so I added them, and I do 4) after 6) in
>> REINDEX. Thus, it seems that in my implementation of tablespace change in
>> REINDEX I am more sure that "the relation tablespace is correctly updated
>> before reindexing", since I do reindex after CCI (point 6), doesn't it?
>>
>> So why it is fine for ATExecSetTableSpace to do pretty much the same, but
>> not for REINDEX? Or the key point is in doing actual work before CCI, but
>> for me it seems a bit against what you have wrote?
> Nope, the order is not the same on what you do here, causing a
> duplication in the tablespace selection within
> RelationSetNewRelfilenode() and when flushing the relation on the new
> tablespace for the first time after the CCI happens, please see
> below. And we should avoid that.
>
>> Thus, I cannot get your point correctly here. Can you, please, elaborate a
>> little bit more your concerns?
> The case of REINDEX CONCURRENTLY is pretty simple, because a new
> relation which is a copy of the old relation is created before doing
> the reindex, so you simply need to set the tablespace OID correctly
> in index_concurrently_create_copy(). And actually, I think that the
> computation is incorrect because we need to check after
> MyDatabaseTableSpace as well, no?

No, the same logic already exists in heap_create:

    if (reltablespace == MyDatabaseTableSpace)
        reltablespace = InvalidOid;

Which is called by index_concurrently_create_copy -> index_create ->
heap_create.

> The case of REINDEX is more tricky, because you are working on a
> relation that already exists, hence I think that what you need to do a
> different thing before the actual REINDEX:
> 1) Update the existing relation's pg_class tuple to point to the new
> tablespace.
> 2) Do a CommandCounterIncrement.
> So I think that the order of the operations you are doing is incorrect,
> and that you have a risk of breaking the existing tablespace assignment
> logic done when first flushing a new relfilenode.
>
> This actually brings an extra thing: when doing a plain REINDEX you
> need to make sure that the past relfilenode of the relation gets away
> properly. The attached POC patch does that before doing the CCI which
> is a bit ugly, but that's enough to show my point, and there is no
> need to touch RelationSetNewRelfilenode() this way.

Thank you for the detailed answer and PoC patch. I will recheck
everything and dig deeper into this problem, and come up with something
closer to the next 01.2020 commitfest.

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message yuzuko 2019-12-02 09:42:22 Re: Autovacuum on partitioned table
Previous Message yuzuko 2019-12-02 09:25:37 Re: Partitioning versus autovacuum