From: | Robert Treat <rob(at)xzilla(dot)net> |
---|---|
To: | Álvaro Herrera <alvherre(at)kurilemu(dot)de> |
Cc: | Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Subject: | Re: Adding REPACK [concurrently] |
Date: | 2025-10-13 00:03:18 |
Message-ID: | CABV9wwOo=wvq1hwTRK6HgBWUB=ekzsEebY30EWoc1V9UJQrrrw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 7, 2025 at 10:05 AM Álvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
> On 2025-Sep-26, Robert Treat wrote:
<snip>
> That said, on this topic, I've always been bothered by our usage of
> command names as verbs, because they are (IMO) horrible for translation.
> For instance, in this version of the patch I am making this change:
>
> if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
> ereport(ERROR,
> - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> - errmsg("cannot cluster a shared catalog")));
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot run %s on a shared catalog",
> + RepackCommandAsString(cmd)));
>
> In the old version, the message is not very translatable because you
> have to find a native word to say "to cluster" or "to vacuum", and that
> doesn't always work very well in a direct translation. For instance, in
> the Spanish message catalog you find this sort of thing:
>
> msgid "vacuuming \"%s.%s.%s\""
> msgstr "haciendo vacuum a «%s.%s.%s»"
>
> which is pretty clear ... but the reason it works, is that I have turned
> the phrase around before translating it. I would struggle if I had to
> find a Spanish verb that means "to repack" without contorting the
> message or saying something absurd and/or against Spanish language
> rules, such as "ejecutando repack en table XYZ" or "repaqueando tabl
> XYZ" (that's not a word!) or "reempaquetando tabla XYZ" (this is
> correct, but far enough from "repack" that it's annoying and potentially
> confusing). So I would rather the original used "running REPACK on
> table using method XYZ", which is very very easy to translate, and then
> the translator doesn't have to editorialize.
>
I see you didn't do this in the current patch, but +1 for this idea
from me. And if you think it'd help, I'm also +1 on the idea for the
main docs as well, for example doing something like
+ <para>
- <application>pg_repackdb</application> is a utility for repacking a
+ <application>pg_repackdb</application> is a utility for running REPACK on a
+ <productname>PostgreSQL</productname> database.
I'd be inclined to leave the internal comments alone though, since
they aren't translated.
> > #5
> > [xzilla(at)zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
> > pg_repackdb: repacking database "pagila"
> >
> > In the above scenario, I am repacking without having previously
> > specified an index. At the SQL level this would throw an error, at the
> > command line it gives me a heart attack. :-)
> > It's actually not that bad, because we don't actually do anything, but
> > maybe we should throw an error?
>
> Yeah, I think this is confusing. I think we should make pg_repackdb
> explicitly indicate what has been done, in all cases, without requiring
> -v. Otherwise it's too confusing, particularly for the using-index mode
> which determines which tables to process based on the existance of an
> index marked indiscluster.
>
At the moment, clusterdb runs silently, but vacuumdb emits output, so
there is an argument for either way as default behavior. That said, I
think the current behavior of vacuum, which is what we are currently
following in pg_repackdb, is the worst of the two:
[xzilla(at)zebes] pgsql/bin/vacuumdb -t actor pagila
vacuumdb: vacuuming database "pagila"
Without any additional information, the information we do give is
misleading; I would rather not say anything. We could of course try to
make this more verbose, but I think clusterdb actually gets this
right...
- say nothing by default (follow the "rule of silence.")
- if we want to see commands, pass -e
- if we want to see the details, pass -v
- if we do something that causes an error, return the error
- if we don't want errors, pass -q
This is also how reindexdb works, and I think most of the other
utilities, and I'd argue this is how vacuumdb should work... to the
extent I almost consider it a bug that it doesn't (I leave a little
room since I am not sure why it doesn't operate like the other
utilities). vacuum is a bit outside the purview of what we are doing
here, but I do think following clusterdb/reindexdb is the behavior we
should follow for pg_repackdb.
> I admit I haven't paid too much attention to these tests. I think I
> would rather create a separate src/test/regress/sql/repack.sql file with
> the tests for this command. Let's consider this part a WIP for now --
> clearly more tests are needed both for the SQL command CLUSTER and for
> pg_repackdb.
Yeah, istm as long as we have all 3 commands (repack, cluster, vacuum
full) we need regression tests for all 3.
> - pg_stat_progress_cluster is no longer a view on top of the low-level
> pg_stat_get_progress_info() function. Instead, it's a view on top of
> pg_stat_progress_repack. The only change it applies on top of that
> one is change the command from REPACK to one of VACUUM FULL or
> CLUSTER, depending on whether an index is being used or not. This
> should keep the behavior identical to previous versions.
> Alternatively we could just hide rows where the command is REPACK, but
> I don't think that would be any better. This way, we maintain
> compatibility with tools reading pg_stat_progress_cluster. Maybe this
> is useless and we should just drop the view, not sure, we can discuss
> separately.
>
I think this mostly depends on how aggressive you want to be in moving
people away from cluster and toward repack. If we remove
_progress_cluster, it will force people to update monitoring which
probably encourages people to switch to pg_repackdb. We probably need
to have at least one "bridge" release though, and I think you've got
the right balance for that.
> - I noticed that you can do "CLUSTER pg_class ON some_index" and it will
> happily modify pg_index.indisclustered, which is a bit weird
> considering that allow_system_table_mods is off -- if you later try
> ALTER TABLE .. SET WITHOUT CLUSTER, it won't let you. I think this is
> bogus and we should change it so that CLUSTER refuses to change the
> clustered index on a system catalog, unless allow_system_table_mods is
> on. However, that would be a change from longstanding behavior which
> is specifically tested for in regression tests, so I didn't do it.
> We can discuss such a change separately. But I did make REPACK refuse
> to do that, because we don't need to propagate bogus historical
> behavior. So REPACK will fail if you try to change the indisclustered
> index, but it will work fine if you repack based on the same index as
> before, or repack with no index.
>
Since cluster will presumably be deprecated with this release, I'd
leave the existing behavior and move forward with repack as you've
laid out.
> - pg_repackdb: if you try with a non-superuser without specifying a
> table name, it will fail as soon as it hits the first catalog table or
> whatever with "ERROR: cannot lock this table". This is sorta fine for
> vacuumdb, but only because VACUUM itself will instead say "WARNING:
> cannot lock table XYZ, skipping", so it's not an error and vacuumdb
> keeps running. IMO this is bogus: vacuumdb should not try to process
> tables that it doesn't have privileges to. However, not wanting to
> change longstanding behavior, I left that alone. For pg_repackdb, I
> added a condition in the WHERE clause there to only fetch tables that
> the current user has MAINTAIN privilege over. Then you can do a
> "pg_repackdb -U foobar" and it will nicely process the tables that
> that user is allowed to process. We can discuss changing the vacuumdb
> behavior separately.
Again, vacuumdb seems to be a good example of what not to do, but I'll
leave that for another thread. In general I like this idea, but it
does make for a weird corner case where if I specify a table with -t
that I don't have permission to repack, repack returns silently whilst
doing nothing. I suppose one way to handle that would be to check if
the table passed in -t is found in the list of tables with MAINTAIN
privileges, and if not to issue a WARNING like "%s not found. Make
sure that the table exists and that you have MAINTAIN privileges".
Robert Treat
https://xzilla.net
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2025-10-13 00:59:21 | Re: Logical Replication of sequences |
Previous Message | Alexey Makhmutov | 2025-10-12 23:58:38 | Re: Adding basic NUMA awareness |