Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Junwang Zhao <zhjwpku(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Sutou Kouhei <kou(at)clear-code(dot)com>, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-08 06:42:06
Message-ID: CAD21AoDkoGL6yJ_HjNOg9cU=aAdW8uQ3rSQOeRS0SX85LPPNwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 8, 2023 at 2:17 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Dec 08, 2023 at 10:32:27AM +0800, Junwang Zhao wrote:
> > I can see FDW related utility commands but no TABLESAMPLE related,
> > and there is a pg_foreign_data_wrapper system catalog which has
> > a *fdwhandler* field.
>
> + */ +CATALOG(pg_copy_handler,4551,CopyHandlerRelationId)
>
> Using a catalog is an over-engineered design. Others have provided
> hints about that upthread, but it would be enough to have one or two
> handler types that are wrapped around one or two SQL *functions*, like
> tablesamples. It seems like you've missed it, but feel free to read
> about tablesample-method.sgml, that explains how this is achieved for
> tablesamples.

Agreed. My previous example of FDW was not a good one, I missed something.

>
> > If we want extensions to create a new copy handler, I think
> > something like pg_copy_hander should be necessary.
>
> A catalog is not necessary, that's the point, because it can be
> replaced by a scan of pg_proc with the function name defined in a COPY
> query (be it through a FORMAT, or different option in a DefElem).
> An example of extension with tablesamples is contrib/tsm_system_rows/,
> that just uses a function returning a tsm_handler:
> CREATE FUNCTION system_rows(internal)
> RETURNS tsm_handler
> AS 'MODULE_PATHNAME', 'tsm_system_rows_handler'
> LANGUAGE C STRICT;
>
> Then SELECT queries rely on the contents of the TABLESAMPLE clause to
> find the set of callbacks it should use by calling the function.
>
> +/* Routines for a COPY HANDLER implementation. */
> +typedef struct CopyRoutine
> +{
>
> FWIW, I find weird the concept of having one handler for both COPY
> FROM and COPY TO as each one of them has callbacks that are mutually
> exclusive to the other, but I'm OK if there is a consensus of only
> one. So I'd suggest to use *two* NodeTags instead for a cleaner
> split, meaning that we'd need two functions for each method. My point
> is that a custom COPY handler could just define a COPY TO handler or a
> COPY FROM handler, though it mostly comes down to a matter of taste
> regarding how clean the error handling becomes if one tries to use a
> set of callbacks with a COPY type (TO or FROM) not matching it.

I tend to agree to have separate two functions for each method. But
given we implement it in tablesample-way, I think we need to make it
clear how to call one of the two functions depending on COPY TO and
FROM.

IIUC in tablesamples cases, we scan pg_proc to find the handler
function like system_rows(internal) by the method name specified in
the query. On the other hand, in COPY cases, the queries would be
going to be like:

COPY tab TO stdout WITH (format = 'arrow');
and
COPY tab FROM stdin WITH (format = 'arrow');

So a custom COPY extension would not be able to define SQL functions
just like arrow(internal) for example. We might need to define a rule
like the function returning copy_in/out_handler must be defined as
<method name>_to(internal) and <method_name>_from(internal).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-12-08 06:45:12 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Michael Paquier 2023-12-08 05:59:41 Re: [PATCH] New [relation] option engine