From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | sawada(dot)mshk(at)gmail(dot)com |
Cc: | david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2025-05-26 01:04:05 |
Message-ID: | 20250526.100405.383968457057016818.kou@clear-code.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <CAD21AoBrSTmPyDai_QVR-XOe7PL722Dazm70A+FpvGy2hfSV9g(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 May 2025 17:57:35 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Proposed approaches to register custom COPY formats:
>> a. Create a function that has the same name of custom COPY
>> format
>> b. Call a register function from _PG_init()
>>
>> FYI: I proposed c. approach that uses a. but it always
>> requires schema name for format name in other e-mail.
>
> With approach (c), do you mean that we require users to change all
> FORMAT option values like from 'text' to 'pg_catalog.text' after the
> upgrade? Or are we exempt the built-in formats?
The latter. 'text' must be accepted because existing pg_dump
results use 'text'. If we reject 'text', it's a big
incompatibility. (We can't dump on old PostgreSQL and
restore to new PostgreSQL.)
>> Users can register the same format name:
>> a. Yes
>> * Users can distinct the same format name by schema name
>> * If format name doesn't have schema name, the used
>> format depends on search_path
>> * Pros:
>> * Using schema for it is consistent with other
>> PostgreSQL mechanisms
>> * Custom format never conflict with built-in
>> format. For example, an extension register "xml" and
>> PostgreSQL adds "xml" later, they are never
>> conflicted because PostgreSQL's "xml" is registered
>> to pg_catalog.
>> * Cons: Different format may be used with the same
>> input. For example, "jsonlines" may choose
>> "jsonlines" implemented by extension X or implemented
>> by extension Y when search_path is different.
>> b. No
>> * Users can use "${schema}.${name}" for format name
>> that mimics PostgreSQL's builtin schema (but it's just
>> a string)
>>
>>
>> Built-in formats (text/csv/binary) should be able to
>> overwritten by extensions:
>> a. (The current patch is no but David's answer is) Yes
>> * Pros: Users can use drop-in replacement faster
>> implementation without changing input
>> * Cons: Users may overwrite them accidentally.
>> It may break pg_dump result.
>> (This is called as "backward incompatibility.")
>> b. No
>
> The summary matches my understanding. I think the second point is
> important. If we go with a tablesample-like API, I agree with David's
> point that all FORMAT values including the built-in formats should
> depend on the search_path value. While it provides a similar user
> experience to other database objects, there is a possibility that a
> COPY with built-in format could work differently on v19 than v18 or
> earlier depending on the search_path value.
Thanks for sharing additional points.
David said that the additional point case is a
responsibility or DBA not PostgreSQL, right?
As I already said, I don't have a strong opinion on which
approach is better. My opinion for the (important) second
point is no. I feel that the pros of a. isn't realistic. If
users want to improve text/csv/binary performance (or
something), they should improve PostgreSQL itself instead of
replacing it as an extension. (Or they should create another
custom copy format such as "faster_text" not "text".)
So I'm OK with the approach b.
>> Are there any missing or wrong items?
>
> I think the approach (b) provides more flexibility than (a) in terms
> of API design as with (a) we need to do everything based on one
> handler function and callbacks.
Thanks for sharing this missing point.
I have a concern that the flexibility may introduce needless
complexity. If it's not a real concern, I'm OK with the
approach b.
>> If we can summarize
>> the current discussion here correctly, others will be able
>> to chime in this discussion. (At least I can do it.)
>
> +1
Are there any more people who are interested in custom COPY
FORMAT implementation design? If no more people, let's
decide it by us.
Thanks,
--
kou
From | Date | Subject | |
---|---|---|---|
Next Message | Sutou Kouhei | 2025-05-26 01:27:20 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | Tom Lane | 2025-05-26 00:25:46 | Re: Non-reproducible AIO failure |