Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-05-26 01:04:05
Message-ID: 20250526.100405.383968457057016818.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoBrSTmPyDai_QVR-XOe7PL722Dazm70A+FpvGy2hfSV9g(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 May 2025 17:57:35 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

>> Proposed approaches to register custom COPY formats:
>> a. Create a function that has the same name of custom COPY
>> format
>> b. Call a register function from _PG_init()
>>
>> FYI: I proposed c. approach that uses a. but it always
>> requires schema name for format name in other e-mail.
>
> With approach (c), do you mean that we require users to change all
> FORMAT option values like from 'text' to 'pg_catalog.text' after the
> upgrade? Or are we exempt the built-in formats?

The latter. 'text' must be accepted because existing pg_dump
results use 'text'. If we reject 'text', it's a big
incompatibility. (We can't dump on old PostgreSQL and
restore to new PostgreSQL.)

>> Users can register the same format name:
>> a. Yes
>> * Users can distinct the same format name by schema name
>> * If format name doesn't have schema name, the used
>> format depends on search_path
>> * Pros:
>> * Using schema for it is consistent with other
>> PostgreSQL mechanisms
>> * Custom format never conflict with built-in
>> format. For example, an extension register "xml" and
>> PostgreSQL adds "xml" later, they are never
>> conflicted because PostgreSQL's "xml" is registered
>> to pg_catalog.
>> * Cons: Different format may be used with the same
>> input. For example, "jsonlines" may choose
>> "jsonlines" implemented by extension X or implemented
>> by extension Y when search_path is different.
>> b. No
>> * Users can use "${schema}.${name}" for format name
>> that mimics PostgreSQL's builtin schema (but it's just
>> a string)
>>
>>
>> Built-in formats (text/csv/binary) should be able to
>> overwritten by extensions:
>> a. (The current patch is no but David's answer is) Yes
>> * Pros: Users can use drop-in replacement faster
>> implementation without changing input
>> * Cons: Users may overwrite them accidentally.
>> It may break pg_dump result.
>> (This is called as "backward incompatibility.")
>> b. No
>
> The summary matches my understanding. I think the second point is
> important. If we go with a tablesample-like API, I agree with David's
> point that all FORMAT values including the built-in formats should
> depend on the search_path value. While it provides a similar user
> experience to other database objects, there is a possibility that a
> COPY with built-in format could work differently on v19 than v18 or
> earlier depending on the search_path value.

Thanks for sharing additional points.

David said that the additional point case is a
responsibility or DBA not PostgreSQL, right?

As I already said, I don't have a strong opinion on which
approach is better. My opinion for the (important) second
point is no. I feel that the pros of a. isn't realistic. If
users want to improve text/csv/binary performance (or
something), they should improve PostgreSQL itself instead of
replacing it as an extension. (Or they should create another
custom copy format such as "faster_text" not "text".)

So I'm OK with the approach b.

>> Are there any missing or wrong items?
>
> I think the approach (b) provides more flexibility than (a) in terms
> of API design as with (a) we need to do everything based on one
> handler function and callbacks.

Thanks for sharing this missing point.

I have a concern that the flexibility may introduce needless
complexity. If it's not a real concern, I'm OK with the
approach b.

>> If we can summarize
>> the current discussion here correctly, others will be able
>> to chime in this discussion. (At least I can do it.)
>
> +1

Are there any more people who are interested in custom COPY
FORMAT implementation design? If no more people, let's
decide it by us.

Thanks,
--
kou

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2025-05-26 01:27:20 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Tom Lane 2025-05-26 00:25:46 Re: Non-reproducible AIO failure