From: | Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: REPACK and naming |
Date: | 2025-09-19 09:36:08 |
Message-ID: | 202509190909.2o7xtjvlytv2@alvherre.pgsql |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2025-Sep-19, David Rowley wrote:
> I was just thinking about how much of a heap-ism cluster using an
> index is. If we were to ever have an index organised table AM, what
> would it mean to REPACK tab USING INDEX idx? Would that "secondary"
> index then go away and the table would become that index? or would
> both continue to exist and the secondary index would be surplus?
So, there's already an implementation of an index-organized table in
OrioleDB, as I understand, so maybe we can ask Alexander K. about this.
I suspect it's fine to say that if you have a table for which it makes
no sense to use REPACK USING INDEX, then we just throw an error in that
case (but I suppose plain REPACK continues to work, and it just
recreates/compacts the primary index and rebuilds all secondary indexes,
just like VACUUM FULL would presumably do.)
> I do understand that heap is well ingrained in our code (still), but
> at least things like system catalogue tables/columns can evolve over
> time. e.g pg_index.indisclustered I could imagine evolving (or
> disappearing) if we had an IOT-AM. I do think locking in syntax is
> going to be quite a bit more permanent and needs to be considered very
> carefully. Something like REPACK tab ORDER BY col1; seems a bit more
> future proof.
Oh, I think we can implement REPACK tab ORDER BY all right -- do note
that the current syntax has mandatory USING INDEX keywords (unlike
CLUSTER), so we can add that feature and others with no grammar
problems. In fact even for current heaps it might make sense to allow
an ORDER BY clause for which there's no index. I don't see us
gratuituously removing the option of specifying just an index name (or
indisclustered), though, because there are likely users that have been
running that for years.
> table_relation_copy_for_cluster() does support both use
> of an Index to get presorted results and sorting by the index's key
> columns, so it doesn't seem impossible that the ability to cluster a
> table *specifically* by an index couldn't easily go away at some
> point.
Well, I hope you mean that clustering by an index would stop being the
_only_ way, not that it would completely disappear as an option.
> Locking us deeper into a syntax for that, I do have concerns for. But
> maybe you've thought about all this already and I'm just not aware...
At this point we're not *implementing* any of that, but it is possible
to do so afterwards and we're not blocking that road.
> I'm also trying to keep something like a column store in mind here
> where you might not have any indexes, and efficient filtering is done
> via the pruning of "chunks", which works by each chunk recording the
> min/max (or maybe a dictionary of) values it contains for the columns.
> I imagine something like that very much would want the ability to have
> something like REPACK tbl ORDER BY col; if you think how efficient
> run-length encoding would be for some orders and now inefficient it
> could be for other orders.
That makes sense, yes, and again, AFAICT it can easily be implemented on
top of the current work.
> Anyway, I'm not intentionally trying to make your job here any more
> complex. I'm just trying to help make sure we don't end up with some
> new syntax that also won't stand up to the test of time.
The time you and others spend on this thread is much appreciated.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"If you want to have good ideas, you must have many ideas. Most of them
will be wrong, and what you have to learn is which ones to throw away."
(Linus Pauling)
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2025-09-19 10:21:29 | Re: Suggestion to add --continue-client-on-abort option to pgbench |
Previous Message | shveta malik | 2025-09-19 09:33:59 | Re: Clear logical slot's 'synced' flag on promotion of standby |