From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Antonin Houska <ah(at)cybertec(dot)at> |
Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: why there is not VACUUM FULL CONCURRENTLY? |
Date: | 2025-07-21 16:48:52 |
Message-ID: | 202507211648.mmyh5w4s25yb@alvherre.pgsql |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello
I started to read through 0001 and my first reaction is that I would
like to make a few more breaking changes. It appears that the current
patch tries to keep things unchanged, or to keep some things with the
CLUSTER name. I'm going to try and get rid of that. For instance, in
the grammar, I'm going to change the productions for CLUSTER so that
they produce a RepackStmt of the appropriate form; ClusterStmt as a
parse node should just be removed as no longer useful. Also, the
cluster() function becomes ExecRepack(), and things flow down from
there.
This leads to a small problem: right now you can say "CLUSTER;" which
processes all tables for which a clustered index has been defined.
In the submitted patch, REPACK doesn't support that mode, and we need a
way to distinguish the mode where it VACUUM FULL all tables from the
mode where it does the all-table CLUSTER. So I propose we allow that
mode with simply
CLUSTER USING INDEX;
which is consistent with the idea of "REPACK table USING INDEX foo"
being "CLUSTER table USING foo".
Implementation-wise, I'm toying with adding a new "command" for REPACK
called REPACK_COMMAND_CLUSTER_ALL, supporting this mode of operation.
That leads to a grammar like
> RepackStmt:
> REPACK USING INDEX
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER_ALL;
> n->relation = NULL;
> n->indexname = NULL;
> n->params = NIL;
> $$ = (Node *) n;
> }
> | REPACK qualified_name opt_using_index
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_REPACK;
> n->relation = $2;
> n->indexname = $3;
> n->params = NIL;
> $$ = (Node *) n;
> }
> | REPACK '(' utility_option_list ')' qualified_name opt_using_index
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_REPACK;
> n->relation = $5;
> n->indexname = $6;
> n->params = $3;
> $$ = (Node *) n;
> }
> | CLUSTER '(' utility_option_list ')'
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER_ALL;
> n->relation = NULL;
> n->indexname = NULL;
> n->params = $3;
> $$ = (Node *) n;
> }
> | CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER;
> n->relation = $5;
> n->indexname = $6;
> n->params = $3;
> $$ = (Node *) n;
> }
> /* unparenthesized VERBOSE kept for pre-14 compatibility */
> | CLUSTER opt_verbose qualified_name cluster_index_specification
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER;
> n->relation = $3;
> n->indexname = $4;
> n->params = NIL;
> if ($2)
> n->params = lappend(n->params, makeDefElem("verbose", NULL, @2));
> $$ = (Node *) n;
> }
> /* unparenthesized VERBOSE kept for pre-17 compatibility */
> | CLUSTER opt_verbose
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER_ALL;
> n->relation = NULL;
> n->indexname = NULL;
> n->params = NIL;
> if ($2)
> n->params = lappend(n->params, makeDefElem("verbose", NULL, @2));
> $$ = (Node *) n;
> }
> /* kept for pre-8.3 compatibility */
> | CLUSTER opt_verbose name ON qualified_name
> {
> RepackStmt *n = makeNode(RepackStmt);
>
> n->command = REPACK_COMMAND_CLUSTER;
> n->relation = $5;
> n->indexname = $3;
> n->params = NIL;
> if ($2)
> n->params = lappend(n->params, makeDefElem("verbose", NULL, @2));
> $$ = (Node *) n;
> }
> ;
>
> opt_using_index:
> ExistingIndex { $$ = $1; }
> | /*EMPTY*/ { $$ = NULL; }
> ;
>
> cluster_index_specification:
> USING name { $$ = $2; }
> | /*EMPTY*/ { $$ = NULL; }
> ;
It's a bit weird that CLUSTER uses just "USING" while REPACK uses
"USING INDEX", but of course we cannot change CLUSTER now; and I think
it's better to have the noise word INDEX for REPACK because it allows
the case of not specifying an index name as described above.
In the current patch we don't yet have a way to use REPACK for an
unadorned "VACUUM FULL" (which processes all tables), but I think I'll
add that as well, if only so that we can claim that the REPACK commands
handles all possible legacy command modes, even if they are not useful
in practice. That would probably be unadorned "REPACK;". With this, we
can easily state that all legacy CLUSTER and VACUUM FULL commands have
an equivalent REPACK formulation.
We're of course not going to _remove_ support for any of those legacy
commands. At the same time, we're not planning to add support for
CONCURRENTLY in the all-tables modes (patch 0004), but I don't think
that's a concern. Somebody could later implement that if they wanted
to, but I think it's pretty useless so IMO it's a waste of time.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
From | Date | Subject | |
---|---|---|---|
Next Message | Sami Imseih | 2025-07-21 17:16:38 | Re: [PATCH] Proposal to Enable/Disable Index using ALTER INDEX |
Previous Message | Sami Imseih | 2025-07-21 16:40:22 | Re: POC: Parallel processing of indexes in autovacuum |