Re: Modest proposal to extend TableAM API for controlling cluster commands

From: Andres Freund <andres(at)anarazel(dot)de>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Modest proposal to extend TableAM API for controlling cluster commands
Date: 2022-06-16 03:18:50
Message-ID: 20220616031850.phyyk2qbazhwcusy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 7:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> But it's up to the TAM what it wants to do with that boolean, if in fact it
> >> does anything at all based on that. A TAM could decide to do the exact
> >> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
> >> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
> >> behavior here>.
> >
> > That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
> > that's possible with all extension APIs.
>
> I don't think it's "stupid stuff". A major motivation, perhaps the only
> useful motivation, for implementing a TAM is to get a non-trivial
> performance boost (relative to heap) for some target workload, almost
> certainly at the expense of worse performance for another workload. To
> optimize any particular workload sufficiently to make it worth the bother,
> you've pretty much got to do something meaningfully different than what heap
> does.

Sure. I just don't see what that has to do with doing something widely
differing in VACUUM FULL. Which AM can't support that? I guess there's some
where implementing the full visibility semantics isn't feasible, but that's
afaics OK.

> >> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
> >> synonyms. Or am I missing something?
> >
> > The callback gets passed use_sort. So just implement it use_sort = false and
> > error out if use_sort = true?
>
> I'm not going to say that your idea is unreasonable for a TAM that you might
> choose to implement, but I don't see why that should be required of all TAMs
> anybody might ever implement.

> The callback that gets use_sort is called from copy_table_data(). By that time, it's too late to avoid the
>
> /*
> * Open the relations we need.
> */
> NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
> OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);
>
> code that has already happened in cluster.c's copy_table_data() function,
> and unless I raise an error, after returning from my TAM's callback, the
> cluster code will replace the old table with the new one. I'm left no
> choices but to copy my data over, loose my data, or abort the command. None
> of those are OK options for me.

I think you need to do a bit more explaining of what you're actually trying to
achieve here. You're just saying "I don't want to", which doesn't really help
me to understand the set of useful options.

> I'm open to different solutions. If a simple callback like
> relation_supports_cluster(Relation rel) is too simplistic, and more
> parameters with more context information is wanted, then fine, let's do
> that.

FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
relation_copy_for_cluster optional.

I still think it's a terrible idea to silently ignore tables in CLUSTER or
VACUUM FULL.

> But I can't really complete my work with the interface as it stands
> now.

Since you've not described that work to a meaningful degree...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-06-16 03:31:12 Re: [Proposal] pg_rewind integration into core
Previous Message Michael Paquier 2022-06-16 03:14:46 Re: warn if GUC set to an invalid shared library