Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: tomas(at)vondra(dot)me, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-12-18 23:43:07
Message-ID: CAD21AoCLxUhQ0uBjDKXvCEtJBCfF13Ru_7u-Qrrsu+0PPUqcPQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 1, 2025 at 6:41 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <c36d218a-bb38-42b9-9076-cb75b8984a39(at)vondra(dot)me>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Nov 2025 18:04:46 +0100,
> Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> > I got pinged about this patch off-list. I won't have capacity to do a
> > proper review, anytime soon, but I got a bit of time to do a simple
> > benchmark (which seems useful as that was one of the concerns in this
> > thread, it seems).
>
> Thanks!!!
>
> I also do the same condition benchmark on my Mac mini:
>
> Machine:
>
> * Apple M1 (8 core)
> * Memory: 16GB
> * macOS Sequoia (15.6)
>
> Parameters:
>
> * N integer columns: 1 10 100
> * N rows: 10 100 1000 10000 10000 100000 1000000
> * Formats: text csv binary
> * Operations: FROM TO
> * Patches: 0001 0002 0003 0004 0005 0006
>
> I ran 5 times for each parameter set and choose the median
> elapsed time.
>
> I used
> https://gitlab.com/ktou/pg-bench/-/blob/main/copy-format-extendable/run.sh
> . I attach it.
>
> I measured 6 times. See the attached mac-mini-result-${N}.{csv,pdf}.
>
> A PDF has 3 columns. They are text, csv and binary formats
> from left to right.
>
> 1st row uses COPY FROM and 0001 patch.
> 2nd row uses COPY TO and 0001 patch.
>
> 3rd row uses COPY FROM and 0002 patch.
> 4th row uses COPY TO and 0002 patch.
>
> ...
>
> Each heatmap visualizes (${elapsed_time_patch} /
> ${elapsed_time_master}) * 100. 100 > (red) means slower and
> 100 < (blue) means faster.
>
> It seems that they don't show any reproducible trends.
>
> For example, the binary cases in mac-mini-result-2.pdf show
> that patched cases are always slower but the binary cases in
> mac-mini-result-{1,6}.pdf show that most patched cases are
> faster. The binary cases in mac-mini-result-{1,5,6}.pdf show
> that 0006 patch is slower than 0005 patch but the binary
> cases in mac-mini-result-{3,4}.pdf don't show it.
>
> Another example, the text cases in mac-mini-result-1.pdf
> show that patched cases are always slower but the text cases
> in mac-mini-result-1.pdf show that most patched cases are
> faster.
>
> I hope that these numbers help to proceed this proposal.

Thank you for sharing the performance test results! I'll run the same
benchmark tests on my environment.

Looking at these results, it seems that 0001-from-binary cases and
0006-to-binary cases are slower throughout the six results?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2025-12-19 00:04:47 Re: Proposal: Conflict log history table for Logical Replication
Previous Message Andres Freund 2025-12-18 23:39:00 Re: Buffer locking is special (hints, checksums, AIO writes)