Re: Modest proposal to extend TableAM API for controlling cluster commands

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Modest proposal to extend TableAM API for controlling cluster commands
Date: 2022-06-16 01:24:45
Message-ID: 63B02BB6-22AF-486C-AC47-18705CBDC918@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jun 15, 2022, at 6:01 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:
>> While developing various Table Access Methods, I have wanted a callback for
>> determining if CLUSTER (and VACUUM FULL) should be run against a table
>> backed by a given TAM. The current API contains a callback for doing the
>> guts of the cluster, but by that time, it's a bit too late to cleanly back
>> out. For single relation cluster commands, raising an error from that
>> callback is probably not too bad. For multi-relation cluster commands, that
>> aborts the clustering of other yet to be processed relations, which doesn't
>> seem acceptable.
>
> Why not? What else do you want to do in that case? Silently ignoring
> non-clusterable tables doesn't seem right either. What's the use-case for
> swallowing the error?

Imagine you develop a TAM for which the concept of "clustering" doesn't have any defined meaning. Perhaps you've arranged the data in a way that has no similarity to heap, and now somebody runs a CLUSTER command (with no arguments.) It's reasonable that they want all their heap tables to get the usual cluster_rel treatment, and that the existence of a table using your exotic TAM shouldn't interfere with that. Then what? You are forced to copy all the data from your OldHeap (badly named) to the NewHeap (also badly named), or to raise an error. That doesn't seem ok.

>> I tried fixing this with a ProcessUtility_hook, but that
>> fires before the multi-relation cluster command has compiled the list of
>> relations to cluster, meaning the ProcessUtility_hook doesn't have access to
>> the necessary information. (It can be hacked to compile the list of
>> relations itself, but that duplicates both code and effort, with the usual
>> risks that the code will get out of sync.)
>>
>> For my personal development, I have declared a new hook, bool
>> (*relation_supports_cluster) (Relation rel). It gets called from
>> commands/cluster.c in both the single-relation and multi-relation code
>> paths, with warning or debug log messages output for relations that decline
>> to be clustered, respectively.
>
> Do you actually need to dynamically decide whether CLUSTER is supported?
> Otherwise we could just make the existing cluster callback optional and error
> out if a table is clustered that doesn't have the callback.

Same as above, I don't know why erroring would be the right thing to do. As a comparison, consider that we don't attempt to cluster a partitioned table, but rather just silently skip it. Imagine if, when we introduced the concept of partitioned tables, we made unqualified CLUSTER commands always fail when they encountered a partitioned table.

>> Before posting a patch, do people think this sounds useful? Would you like
>> the hook function signature to differ in some way? Is a simple "yes this
>> relation may be clustered" vs. "no this relation may not be clustered"
>> interface overly simplistic?
>
> It seems overly complicated, if anything ;)

For the TAMs I've developed thus far, I don't need the (Relation rel) parameter, and could just have easily used (void). But that seems to fence in what other TAM authors could do in future.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-06-16 01:34:22 Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
Previous Message Mark Dilger 2022-06-16 01:16:21 Extending USING [heap | mytam | yourtam] grammar and behavior