Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-05-01 19:15:30
Message-ID: CAD21AoBuEqcz2_+dpA3WTiDUF=FgudPBKwM+nvH+qHT-k4p5mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 25, 2025 at 5:45 AM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> I've updated the patch set. See the attached v40 patch set.
>
> In <CAD21AoAXzwPC7jjPMTcT80hnzmPa2SUJkiqdYHweEY8sZscEMA(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 23 Apr 2025 23:44:55 -0700,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> >> Are the followings correct?
> >>
> >> 1. Move invalid input patterns in
> >> src/test/modules/test_copy_format/sql/invalid.sql to
> >> src/test/regress/sql/copy.sql as much as possible.
> >> 2. Create
> >> src/test/modules/test_copy_format/sql/test_copy_format.sql
> >> and move all contents in existing *.sql to the file.
> >> 3. Add comments what the tests expect to
> >> src/test/modules/test_copy_format/sql/test_copy_format.sql.
> >> 4. Remove CopyFormatOptions::{binary,csv_mode}.
> >
> > Agreed with the above items.
>
> Done except 1. because 1. is removed by 3. in the following
> list:
>
> ----
> >> There are 3 unconfirmed suggested changes for tests in:
> >> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
> >>
> >> Here are my opinions for them:
> >>
> >> > 1.: There is no difference between single-quoting and
> >> > double-quoting here. Because the information what quote
> >> > was used for the given FORMAT value isn't remained
> >> > here. Should we update gram.y?
> >> >
> >> > 2.: I don't have a strong opinion for it. If nobody objects
> >> > it, I'll remove them.
> >> >
> >> > 3.: I don't have a strong opinion for it. If nobody objects
> >> > it, I'll remove them.
> ----
>
> 0005 is added for 4. Could you squash 0004 ("Use copy
> handler for bult-in formats") and 0005 ("Remove
> CopyFormatOptions::{binary,csv_mode}") if needed when you
> push?
>
> >> 6. Use handler OID for detecting the default built-in format
> >> instead of comparing the given format as string.
>
> Done.
>
> >> 7. Update documentation.
>
> Could someone help this? 0007 is the draft commit for this.
>
> >> There are 3 unconfirmed suggested changes for tests in:
> >> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
> >>
> >> Here are my opinions for them:
> >>
> >> > 1.: There is no difference between single-quoting and
> >> > double-quoting here. Because the information what quote
> >> > was used for the given FORMAT value isn't remained
> >> > here. Should we update gram.y?
> >> >
> >> > 2.: I don't have a strong opinion for it. If nobody objects
> >> > it, I'll remove them.
> >> >
> >> > 3.: I don't have a strong opinion for it. If nobody objects
> >> > it, I'll remove them.
> >>
> >> Is the 1. required for "ready for merge"? If so, is there
> >> any suggestion? I don't have a strong opinion for it.
> >>
> >> If there are no more opinions for 2. and 3., I'll remove
> >> them.
> >
> > Agreed.
>
> 1.: I didn't do anything. Because there is no suggestion.
>
> 2., 3.: Done.

Thank you for updating the patches.

One of the primary considerations we need to address is the treatment
of the specified format name. The current patch set utilizes built-in
formats (namely 'csv', 'text', and 'binary') when the format name is
either unqualified or explicitly specified with 'pg_catalog' as the
schema. In all other cases, we search for custom format handler
functions based on the search_path. To be frank, I have reservations
about this interface design, as the dependence of the specified custom
format name on the search_path could potentially confuse users.

In light of these concerns, I've been contemplating alternative
interface designs. One promising approach would involve registering
custom copy formats via a C function during module loading
(specifically, in _PG_init()). This method would require extension
authors to invoke a registration function, say
RegisterCustomCopyFormat(), in _PG_init() as follows:

JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
&JsonLinesCopyToRoutine,
&JsonLinesCopyFromRoutine);

The registration function would validate the format name and store it
in TopMemoryContext. It would then return a unique identifier that can
be used subsequently to reference the custom copy format extension.

Custom copy format modules could be loaded through
shared_preload_libraries, session_preload_libraries, or the LOAD
command. Extensions could register their own options within this
framework, for example:

RegisterCustomCopyFormatOption(JsonLinesFormatId,
"custom_option",
custom_option_handler);

This approach offers several advantages: it would eliminate the
search_path issue, provide greater flexibility, and potentially
simplify the overall interface for users and developers alike. We
might be able to provide a view showing the registered custom COPY
format in the future. Also, these interfaces align with other
customizable functionalities such as custom rmgr, custom lwlock,
custom waitevent, and custom EXPLAIN option etc.

Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2025-05-01 19:24:19 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Ilia Evdokimov 2025-05-01 19:10:56 Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment