Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-07-18 10:05:53
Message-ID: 20250718.190553.1172585000083080334.kou@clear-code.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoAZL2RzPM4RLOJKm_73z5LXq2_VOVF+S+T0tnbjHdWTFA(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 17 Jul 2025 13:44:11 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

>> > How about adding accessors instead of splitting
>> > Copy{From,To}State to Copy{From,To}ExecutionData? If we use
>> > the accessors approach, we can export only needed
>> > information step by step without breaking ABI.
>
> Yeah, while it can export required fields without breaking ABI, I'm
> concerned that setter and getter functions could be bloated if we need
> to have them for many fields.

In general, I choose this approach in my projects even when
I need to define many accessors. Because I can hide
implementation details from users. I can change
implementation details without breaking API/ABI.

But PostgreSQL isn't my project. Is there any guideline for
PostgreSQL API(/ABI?) design that we can refer for this
case?

FYI: We need to export at least the following fields:

https://www.postgresql.org/message-id/flat/20250714.173803.865595983884510428.kou%40clear-code.com#78fdbccf89742f856aa2cf95eaf42032

> FROM:
>
> - attnumlist (*)
> - bytes_processed
> - cur_attname
> - escontext
> - in_functions (*)
> - input_buf
> - input_reached_eof
> - line_buf
> - opts (*)
> - raw_buf
> - raw_buf_index
> - raw_buf_len
> - rel (*)
> - typioparams (*)
>
> TO:
>
> - attnumlist (*)
> - fe_msgbuf
> - opts (*)

Here are pros/cons of the Copy{From,To}ExecutionData
approach, right?

Pros:
1. We can hide internal data from extensions

Cons:
1. Built-in format routines need to refer fields via
Copy{From,To}ExecutionData.
* This MAY has performance impact. If there is no
performance impact, this is not a cons.
2. API/ABI compatibility will be broken when we change
exported fields.
* I'm not sure whether this is a cons in the PostgreSQL
design.

Here are pros/cons of the accessors approach:

Pros:
1. We can hide internal data from extensions
2. We can export new fields change field names
without breaking API/ABI compatibility
3. We don't need to change built-in format routines.
So we can assume that there is no performance impact.

Cons:
1. We may need to define many accessors
* I'm not sure whether this is a cons in the PostgreSQL
design.

>> Another idea: We'll add Copy{From,To}State::opaque
>> eventually. (For example, the v40-0003 patch includes it.)
>>
>> How about using it to hide fields only for built-in formats?
>
> What is the difference between your idea and splitting CopyToState
> into CopyToState and CopyToExecutionData?

1. We don't need to manage 2 similar data for built-in
formats and extensions.
* Build-in formats use CopyToExecutionData and extensions
use opaque.
2. We can introduce registration API now.
* We can work on this topic AFTER we introduce
registration API.
* e.g.: Add registration API -> Add opaque -> Use opaque
for internal fields (we will benchmark this
implementation at this time)

Thanks,
--
kou

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2025-07-18 10:29:44 Re: Foreign key isolation tests
Previous Message Joel Jacobson 2025-07-18 09:59:24 Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput