Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-14 20:19:43
Message-ID: CAD21AoCZv3cVU+NxR2s9J_dWvjrS350GFFr2vMgCH8wWxQ5hTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 14, 2023 at 6:44 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoCvjGserrtEU=UcA3Mfyfe6ftf9OXPHv9fiJ9DmXMJ2nQ(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 10:57:15 +0900,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > IIUC we cannot create two same name functions with the same arguments
> > but a different return value type in the first place. It seems to me
> > to be an overkill to change such a design.
>
> Oh, sorry. I didn't notice it.
>
> > Another idea is to encapsulate copy_to/from_handler by a super class
> > like copy_handler. The handler function is called with an argument,
> > say copyto, and returns copy_handler encapsulating either
> > copy_to/from_handler depending on the argument.
>
> It's for using "${copy_format_name}" such as "json" and
> "parquet" as a function name, right?

Right.

> If we use the
> "${copy_format_name}" approach, we can't use function names
> that are already used by tablesample method handler such as
> "system" and "bernoulli" for COPY FORMAT name. Because both
> of tablesample method handler function and COPY FORMAT
> handler function use "(internal)" as arguments.
>
> I think that tablesample method names and COPY FORMAT names
> will not be conflicted but the limitation (using the same
> namespace for tablesample method and COPY FORMAT) is
> unnecessary limitation.

Presumably, such function name collisions are not limited to
tablesample and copy, but apply to all functions that have an
"internal" argument. To avoid collisions, extensions can be created in
a different schema than public. And note that built-in format copy
handler doesn't need to declare its handler function.

>
> How about using prefix ("copy_to_${copy_format_name}" or
> something) or suffix ("${copy_format_name}_copy_to" or
> something) for function names? For example,
> "copy_to_json"/"copy_from_json" for "json" COPY FORMAT.
>
> ("copy_${copy_format_name}" that returns copy_handler
> encapsulating either copy_to/from_handler depending on the
> argument may be an option.)

While there is a way to avoid collision as I mentioned above, I can
see the point that we might want to avoid using a generic function
name such as "arrow" and "parquet" as custom copy handler functions.
Adding a prefix or suffix would be one option but to give extensions
more flexibility, another option would be to support format = 'custom'
and add the "handler" option to specify a copy handler function name
to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom',
HANDLER = 'arrow_copy_handler').

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-12-14 20:48:52 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Previous Message Euler Taveira 2023-12-14 19:05:44 Re: logical decoding and replication of sequences, take 2