Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Date: 2018-12-27 12:06:54
Message-ID: f3255cfc-a395-133a-17d1-5fca873e7b05@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you all for replies.

>> ALTER TABLE already has a lot of logic that is oriented towards being
>> able to do multiple things at the same time. If we added CLUSTER,
>> VACUUM FULL, and REINDEX to that set, then you could, say, change a
>> data type, cluster, and change tablespaces all in a single SQL
>> command.
> That's a great observation.

Indeed, I thought that ALTER TABLE executes all actions sequentially one
by one, e.g. in the case of

ALTER TABLE test_int CLUSTER ON test_int_idx, SET TABLESPACE test_tblspc;

it executes CLUSTER and THEN executes SET TABLESPACE. However, if I get
it right, ALTER TABLE is rather smart, so in such a case it follows the
steps:

1) Only saves new tablespace Oid during prepare phase 1 without actual work;

2) Only executes mark_index_clustered during phase 2, again without
actual work done;

3) And finally rewrites relation during phase 3, where CLUSTER and SET
TABLESPACE are effectively performed.

>> That would be cool, but probably a lot of work. :-(
> But is it? ALTER TABLE is already doing one kind of table rewrite
> during phase 3, and CLUSTER is just a different kind of table rewrite
> (which happens to REINDEX), and VACUUM FULL is just a special case of
> CLUSTER. Maybe what we need is an ALTER TABLE variant that executes
> CLUSTER's table rewrite during phase 3 instead of its ad-hoc table
> rewrite.

According to the ALTER TABLE example above, it is already exist for CLUSTER.

> As for REINDEX, I think it's valuable to move tablespace together with
> the reindexing. You can already do it with the CREATE INDEX
> CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY is
> not going to provide that, and it seems worth doing.

Maybe I am missing something, but according to the docs REINDEX
CONCURRENTLY does not exist yet, DROP then CREATE CONCURRENTLY is
suggested instead. Thus, we have to add REINDEX CONCURRENTLY first, but
it is a matter of different patch, I guess.

>> Even for plain REINDEX that seems useful.
>> --
>> Michael

To summarize:

1) Alvaro and Michael agreed, that REINDEX with tablespace move may be
useful. This is done in the patch attached to my initial email. Adding
REINDEX to ALTER TABLE as new action seems quite questionable for me and
not completely semantically correct. ALTER already looks bulky.

2) If I am correct, 'ALTER TABLE ... CLUSTER ON ..., SET TABLESPACE ...'
does exactly what I wanted to add to CLUSTER in my patch. So probably no
work is necessary here.

3) VACUUM FULL. It seems, that we can add special case 'ALTER TABLE ...
VACUUM FULL, SET TABLESPACE ...', which will follow relatively the same
path as with CLUSTER ON, but without any specific index. Relation should
be rewritten in the new tablespace during phase 3.

What do you think?

Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-12-27 12:54:34 Re: global / super barriers (for checksums)
Previous Message Marcus Mao 2018-12-27 11:58:51 Re: PostgreSQL partition tables use more private memory