Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: tomas(at)vondra(dot)me, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2026-06-23 01:06:07
Message-ID: CAD21AoCnA7vayZAOmwVqTSOyWfyBhyxH7mBb4UzjskF-eZ+_Jg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Mar 26, 2026 at 6:36 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoCLxUhQ0uBjDKXvCEtJBCfF13Ru_7u-Qrrsu+0PPUqcPQ(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 18 Dec 2025 15:43:07 -0800,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > Looking at these results, it seems that 0001-from-binary cases and
> > 0006-to-binary cases are slower throughout the six results?
>
> Good point. I didn't notice them. But I feel that it's not
> related to the patch set. Because 0001 doesn't change COPY
> FROM related code. 0001 just changes COPY TO related
> code. And 0006 just adds tests. 0006 doesn't change
> implementations.
>
>
> BTW, how to proceed this proposal? It seems that we can't
> proceed this proposal without PostgreSQL committers'
> attentions but it seems that it's difficult.

Sorry for going quiet on this for a while -- I haven't had time to
work on it until now.

After more thought, I'd like to keep the custom-format changes to the
bare minimum and not disturb the existing built-in format processing.

In particular, I've dropped the earlier rework that split
CopyToStateData / CopyFromStateData to hide built-in-specific fields
from extensions. That was my own idea, but I no longer think it pays
off: the fields it hid (raw_buf, line_buf, the input buffers, etc.)
are only ever used by the built-in text/CSV/binary parsers, and a
custom format never touches them -- so visible or not, nothing depends
on them, while splitting the struct is invasive to the existing format
processing. Touching the Copy state structs is fine in itself; it's
the hiding that wasn't worth the cost.

Instead, each state struct just gets one opaque pointer for a custom
format to keep its own state, and the existing code paths are left
alone.

Updated patches attached:

- 0001 moves CopyFromStateData and CopyToStateData to a new
copy_state.h, so extensions can implement their routines without
including the *_internal.h headers. It also drops file_fdw.c's
dependency on copyfrom_internal.h.
- 0002 introduces the registration API and the opaque per-format
pointer in both structs.
- 0003 adds a callback to validate the COPY options as a whole, called
after all options are processed.
- 0004 adds the regression tests.

I'd like to proceed in this direction barring objections. Feedback is
very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0002-Allow-extensions-to-register-custom-format-to-COP.patch text/x-patch 19.1 KB
v2-0004-Add-test-module-for-COPY-custom-format.patch text/x-patch 17.3 KB
v2-0003-Add-an-hook-for-custom-COPY-format-option-validat.patch text/x-patch 5.6 KB
v2-0001-Move-Copy-From-To-StateData-to-copy_state.h.patch text/x-patch 26.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2026-06-23 01:19:41 Re: Performance Degradation (Table becomes bloat) During Repeated Bulk UPDATE Operations
Previous Message Peter Smith 2026-06-23 01:03:23 Re: Include sequences in publications created by pg_createsubscriber