Re: TABLESAMPLE patch is really in pretty sad shape

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TABLESAMPLE patch is really in pretty sad shape
Date: 2015-07-13 08:46:08
Message-ID: CAB7nPqQk-SRhOUn612XMwL0DyXjBvRZ9gWnKbHb924TpN4=G1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 13, 2015 at 7:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> The two contrib modules this patch added are nowhere near fit for public
>> consumption. They cannot clean up after themselves when dropped:
>> ...
>> Raw inserts into system catalogs just
>> aren't a sane thing to do in extensions.
>
> I had some thoughts about how we might fix that, without going to the
> rather tedious lengths of creating a complete set of DDL infrastructure
> for CREATE/DROP TABLESAMPLE METHOD.
>
> First off, the extension API designed for the tablesample patch is
> evidently modeled on the index AM API, which was not a particularly good
> precedent --- it's not a coincidence that index AMs can't be added or
> dropped on-the-fly. Modeling a server internal API as a set of
> SQL-visible functions is problematic when the call signatures of those
> functions can't be accurately described by SQL datatypes, and it's rather
> pointless and inefficient when none of the functions in question is meant
> to be SQL-callable. It's even less attractive if you don't think you've
> got a completely stable API spec, because adding, dropping, or changing
> signature of any one of the API functions then involves a pile of
> easy-to-mess-up catalog changes on top of the actually useful work.
> Not to mention then having to think about backwards compatibility of
> your CREATE command's arguments.
>
> We have a far better model to follow already, namely the foreign data
> wrapper API. In that, there's a single SQL-visible function that returns
> a dummy datatype indicating that it's an FDW handler, and when called,
> it hands back a struct containing pointers to all the other functions
> that the particular wrapper needs to supply (and, if necessary, the struct
> could have non-function-pointer fields containing other info the core
> system might need to know about the handler). We could similarly invent a
> pseudotype "tablesample_handler" and represent each tablesample method by
> a single SQL function returning tablesample_handler. Any future changes
> in the API for tablesample handlers would then appear as changes in the C
> definition of the struct returned by the handler, which requires no
> SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it
> is pretty easy to design it so that failure to update an external module
> that implements the API results in C compiler errors and/or sane fallback
> behavior.

That's elegant.

> Once we've done that, I think we don't even need a special catalog
> representing tablesample methods. Given "FROM foo TABLESAMPLE
> bernoulli(...)", the parser could just look for a function bernoulli()
> returning tablesample_handler, and it's done. The querytree would have an
> ordinary function dependency on that function, which would be sufficient
> to handle DROP dependency behaviors properly. (On reflection, maybe
> better if it's "bernoulli(internal) returns tablesample_handler",
> so as to guarantee that such a function doesn't create a conflict with
> any user-defined function of the same name.)
>
> Thoughts?

Regarding the fact that those two contrib modules can be part of a
-contrib package and could be installed, nuking those two extensions
from the tree and preventing the creating of custom tablesample
methods looks like a correct course of things to do for 9.5.

When looking at this patch I was as well surprised that the BERNOUILLI
method can only be applied on a physical relation and was not able to
fire on a materialized set of tuples, say the result of a WITH clause
for example.

> PS: now that I've written this rant, I wonder why we don't redesign the
> index AM API along the same lines. It probably doesn't matter much at
> the moment, but if we ever get serious about supporting index AM
> extensions, I think we ought to consider doing that.

Any API redesign looks to be clearly 9.6 work IMO at this point.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-07-13 08:51:44 Re: multivariate statistics / patch v7
Previous Message Michael Paquier 2015-07-13 08:11:56 Re: [PATCH] SQL function to report log message