Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-06-24 02:59:17
Message-ID: CAD21AoA57owo6qYTPTxOtCjDmcuj1tGL1aGs95cvEnoLQvwF0A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 18, 2025 at 12:59 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <aFC5HmZHU5NCPuTL(at)paquier(dot)xyz>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 17 Jun 2025 09:38:54 +0900,
> Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> > On Tue, Jun 17, 2025 at 08:50:37AM +0900, Sutou Kouhei wrote:
> >> OK. I'll implement the initial version with this
> >> design. (Allocating IDs local not shared.)
> >
> > Sounds good to me. Thanks Sutou-san!
>
> I've attached the v41 patch set that uses the C API approach
> with local (not shared) COPY routine management.
>
> 0001: This is same as 0001 in the v40 patch set. It just
> cleans up CopySource and CopyDest enums.
> 0002: This is the initial version of this approach.

Thank you for updating the patches!

> Here are some discussion points:
>
> 1. This provides 2 registration APIs
> (RegisterCopy{From,To}Routine(name, routine)) instead of
> 1 registration API (RegisterCopyFormat(name,
> from_routine, to_routine)).
>
> It's for simple implementation and easy to extend without
> breaking APIs in the future. (And some formats may
> provide only FROM routine or TO routine.)
>
> Is this design acceptable?

With the single registration API idea, we can register the custom
format routine that supports only FROM routine using the API like:

RegisterCopyRoutine("new-format", NewFormatFromRoutine, NULL);

Compared to this approach, what points do you think having separate
two registration APIs is preferable in terms of extendability and API
compatibility? I think it would be rather confusing for example if
each COPY TO routine and COPY FROM routine is registered by different
extensions with the same format name.

> FYI: RegisterCopy{From,To}Routine() uses the same logic
> as RegisterExtensionExplainOption().

I'm concerned that the patch has duplicated logics for the
registration of COPY FROM and COPY TO.

>
> 2. This allocates IDs internally but doesn't provide APIs
> that get them. Because it's not needed for now.
>
> We can provide GetExplainExtensionId() like API when we
> need it.
>
> Is this design acceptable?

+1

>
> 3. I want to register the built-in COPY {FROM,TO} routines
> in the PostgreSQL initialization phase. Where should we
> do it? In 0002, it's done in InitPostgres() but I'm not
> sure whether it's a suitable location or not.

InitPostgres() is not a correct function as it's a process
initialization function. Probably we don't necessarily need to
register the built-in formats in the same way as custom formats. A
simple solution would be to have separate arrays for built-in formats
and custom formats and have the GetCopy[To|From]Routine() search both
arrays (built-in array first).

> 4. 0002 adds CopyFormatOptions::routine as union:
>
> @@ -87,9 +91,14 @@ typedef struct CopyFormatOptions
> CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */
> int64 reject_limit; /* maximum tolerable number of errors */
> List *convert_select; /* list of column names (can be NIL) */
> + union
> + {
> + const struct CopyFromRoutine *from; /* for COPY FROM */
> + const struct CopyToRoutine *to; /* for COPY TO */
> + } routine; /* routine to process the specified format */
> } CopyFormatOptions;
>
> Because one of Copy{From,To}Routine is only needed at
> once. Is this union usage strange in PostgreSQL?

I think we can live with having two fields as there are other options
that are used only in either COPY FROM or COPY TO.

>
> 5. 0002 adds InitializeCopy{From,To}Routines() and
> GetCopy{From,To}Routine() that aren't used by COPY
> {FROM,TO} routine implementations to copyapi.h. Should we
> move them to other .h? If so, which .h should be used for
> them?

As I commented at 3, I think it's better to avoid dynamically
registering the built-in formats.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2025-06-24 03:19:48 Re: Per-role disabling of LEAKPROOF requirements for row-level security?
Previous Message Tomas Vondra 2025-06-24 01:43:19 Re: pgsql: Introduce pg_shmem_allocations_numa view