| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Sutou Kouhei <kou(at)clear-code(dot)com> |
| Cc: | tomas(at)vondra(dot)me, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
| Date: | 2026-06-23 01:06:07 |
| Message-ID: | CAD21AoCnA7vayZAOmwVqTSOyWfyBhyxH7mBb4UzjskF-eZ+_Jg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Thu, Mar 26, 2026 at 6:36 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoCLxUhQ0uBjDKXvCEtJBCfF13Ru_7u-Qrrsu+0PPUqcPQ(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 18 Dec 2025 15:43:07 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > Looking at these results, it seems that 0001-from-binary cases and
> > 0006-to-binary cases are slower throughout the six results?
>
> Good point. I didn't notice them. But I feel that it's not
> related to the patch set. Because 0001 doesn't change COPY
> FROM related code. 0001 just changes COPY TO related
> code. And 0006 just adds tests. 0006 doesn't change
> implementations.
>
>
> BTW, how to proceed this proposal? It seems that we can't
> proceed this proposal without PostgreSQL committers'
> attentions but it seems that it's difficult.
Sorry for going quiet on this for a while -- I haven't had time to
work on it until now.
After more thought, I'd like to keep the custom-format changes to the
bare minimum and not disturb the existing built-in format processing.
In particular, I've dropped the earlier rework that split
CopyToStateData / CopyFromStateData to hide built-in-specific fields
from extensions. That was my own idea, but I no longer think it pays
off: the fields it hid (raw_buf, line_buf, the input buffers, etc.)
are only ever used by the built-in text/CSV/binary parsers, and a
custom format never touches them -- so visible or not, nothing depends
on them, while splitting the struct is invasive to the existing format
processing. Touching the Copy state structs is fine in itself; it's
the hiding that wasn't worth the cost.
Instead, each state struct just gets one opaque pointer for a custom
format to keep its own state, and the existing code paths are left
alone.
Updated patches attached:
- 0001 moves CopyFromStateData and CopyToStateData to a new
copy_state.h, so extensions can implement their routines without
including the *_internal.h headers. It also drops file_fdw.c's
dependency on copyfrom_internal.h.
- 0002 introduces the registration API and the opaque per-format
pointer in both structs.
- 0003 adds a callback to validate the COPY options as a whole, called
after all options are processed.
- 0004 adds the regression tests.
I'd like to proceed in this direction barring objections. Feedback is
very welcome.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0002-Allow-extensions-to-register-custom-format-to-COP.patch | text/x-patch | 19.1 KB |
| v2-0004-Add-test-module-for-COPY-custom-format.patch | text/x-patch | 17.3 KB |
| v2-0003-Add-an-hook-for-custom-COPY-format-option-validat.patch | text/x-patch | 5.6 KB |
| v2-0001-Move-Copy-From-To-StateData-to-copy_state.h.patch | text/x-patch | 26.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bharath Rupireddy | 2026-06-23 01:19:41 | Re: Performance Degradation (Table becomes bloat) During Repeated Bulk UPDATE Operations |
| Previous Message | Peter Smith | 2026-06-23 01:03:23 | Re: Include sequences in publications created by pg_createsubscriber |