From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | sawada(dot)mshk(at)gmail(dot)com |
Cc: | michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2025-07-14 08:38:03 |
Message-ID: | 20250714.173803.865595983884510428.kou@clear-code.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <CAD21AoB0Z3gkOGALK3pXYmGTWATVvgDAmn-yXGp2mX64S-YrSw(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 14 Jul 2025 03:28:16 +0900,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> I've reviewed the 0001 and 0002 patches. The API implemented in the
> 0002 patch looks good to me, but I'm concerned about the capsulation
> of copy state data. With the v42 patches, we pass the whole
> CopyToStateData to the extension codes, but most of the fields in
> CopyToStateData are internal working state data that shouldn't be
> exposed to extensions. I think we need to sort out which fields are
> exposed or not. That way, it would be safer and we would be able to
> avoid exposing copyto_internal.h and extensions would not need to
> include copyfrom_internal.h.
FYI: We discussed this so far. For example:
> I think we can move CopyToState to copy.h and we don't
> need to have set/get functions for its fields.
> > What does "private" mean here? I thought that it means that
> > "PostgreSQL itself can use it". But it seems that you mean
> > that "PostgreSQL itself and custom format extensions can use
> > it but other extensions can't use it".
> >
> > I'm not familiar with "_internal.h" in PostgreSQL but is
> > "_internal.h" for the latter "private" mean?
>
> My understanding is that we don't strictly prohibit _internal.h from
> being included in out of core files. For example, file_fdw.c includes
> copyfrom_internal.h in order to access some fields of CopyFromState.
In general, I agree that we should export only needed
information.
How about adding accessors instead of splitting
Copy{From,To}State to Copy{From,To}ExecutionData? If we use
the accessors approach, we can export only needed
information step by step without breaking ABI.
The built-in formats can keep using Copy{From,To}State
directly with the accessors approach. We can avoid any
performance regression of the built-in formats. If we split
Copy{From,To}State to Copy{From,To}ExecutionData,
performance may be changed.
> I've implemented a draft patch for that idea. In the 0001 patch, I
> moved fields that are related to internal working state from
> CopyToStateData to CopyToExectuionData. COPY routine APIs pass a
> pointer of CopyToStateData but extensions can access only fields
> except for CopyToExectuionData. In the 0002 patch, I've implemented
> the registration API and some related APIs based on your v42 patch.
> I've made similar changes to COPY FROM codes too.
>
> The patch is a very PoC phase and we would need to scrutinize the
> fields that should or should not be exposed. Feedback is very welcome.
Based on our sample extensions [1][2], the following fields
may be minimal. I added "(*)" marks that exist in
Copy{From,To}StateDate in your patch. Other fields exist in
Copy{From,To}ExecutionData. We need to export them to
extensions. We can hide fields in Copy{From,To}StateData not
listed here.
FROM:
- attnumlist (*)
- bytes_processed
- cur_attname
- escontext
- in_functions (*)
- input_buf
- input_reached_eof
- line_buf
- opts (*)
- raw_buf
- raw_buf_index
- raw_buf_len
- rel (*)
- typioparams (*)
TO:
- attnumlist (*)
- fe_msgbuf
- opts (*)
[1] https://github.com/kou/pg-copy-arrow/
[2] https://github.com/MasahikoSawada/pg_copy_jsonlines/
Thanks,
--
kou
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Koval | 2025-07-14 08:41:30 | Re: Add SPLIT PARTITION/MERGE PARTITIONS commands |
Previous Message | Peter Smith | 2025-07-14 08:37:41 | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |