Re: Modest proposal to extend TableAM API for controlling cluster commands

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Modest proposal to extend TableAM API for controlling cluster commands
Date: 2022-06-16 03:10:30
Message-ID: FEBF35DC-8EC1-4D3A-9232-BA337FBCF423@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jun 15, 2022, at 7:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> It's effectively a synonym which determines whether the "bool use_sort"
>> parameter of the table AM's relation_copy_for_cluster will be set. Heap-AM
>> plays along and sorts or not based on that.
>
> Hardly a synonym if it behaves differently?

I don't think this point is really worth arguing. We don't have to call it a synonym, and the rest of the discussion remains the same.

>> But it's up to the TAM what it wants to do with that boolean, if in fact it
>> does anything at all based on that. A TAM could decide to do the exact
>> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
>> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
>> behavior here>.
>
> That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
> that's possible with all extension APIs.

I don't think it's "stupid stuff". A major motivation, perhaps the only useful motivation, for implementing a TAM is to get a non-trivial performance boost (relative to heap) for some target workload, almost certainly at the expense of worse performance for another workload. To optimize any particular workload sufficiently to make it worth the bother, you've pretty much got to do something meaningfully different than what heap does.

>> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
>> synonyms. Or am I missing something?
>
> The callback gets passed use_sort. So just implement it use_sort = false and
> error out if use_sort = true?

I'm not going to say that your idea is unreasonable for a TAM that you might choose to implement, but I don't see why that should be required of all TAMs anybody might ever implement.

The callback that gets use_sort is called from copy_table_data(). By that time, it's too late to avoid the

/*
* Open the relations we need.
*/
NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);

code that has already happened in cluster.c's copy_table_data() function, and unless I raise an error, after returning from my TAM's callback, the cluster code will replace the old table with the new one. I'm left no choices but to copy my data over, loose my data, or abort the command. None of those are OK options for me.

I'm open to different solutions. If a simple callback like relation_supports_cluster(Relation rel) is too simplistic, and more parameters with more context information is wanted, then fine, let's do that. But I can't really complete my work with the interface as it stands now.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-06-16 03:14:46 Re: warn if GUC set to an invalid shared library
Previous Message Kyotaro Horiguchi 2022-06-16 03:07:47 Re: Using PQexecQuery in pipeline mode produces unexpected Close messages