Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: tomas(at)vondra(dot)me
Cc: sawada(dot)mshk(at)gmail(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-12-02 02:39:57
Message-ID: 20251202.113957.631944095477627464.kou@clear-code.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <c36d218a-bb38-42b9-9076-cb75b8984a39(at)vondra(dot)me>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Nov 2025 18:04:46 +0100,
Tomas Vondra <tomas(at)vondra(dot)me> wrote:

> I got pinged about this patch off-list. I won't have capacity to do a
> proper review, anytime soon, but I got a bit of time to do a simple
> benchmark (which seems useful as that was one of the concerns in this
> thread, it seems).

Thanks!!!

I also do the same condition benchmark on my Mac mini:

Machine:

* Apple M1 (8 core)
* Memory: 16GB
* macOS Sequoia (15.6)

Parameters:

* N integer columns: 1 10 100
* N rows: 10 100 1000 10000 10000 100000 1000000
* Formats: text csv binary
* Operations: FROM TO
* Patches: 0001 0002 0003 0004 0005 0006

I ran 5 times for each parameter set and choose the median
elapsed time.

I used
https://gitlab.com/ktou/pg-bench/-/blob/main/copy-format-extendable/run.sh
. I attach it.

I measured 6 times. See the attached mac-mini-result-${N}.{csv,pdf}.

A PDF has 3 columns. They are text, csv and binary formats
from left to right.

1st row uses COPY FROM and 0001 patch.
2nd row uses COPY TO and 0001 patch.

3rd row uses COPY FROM and 0002 patch.
4th row uses COPY TO and 0002 patch.

...

Each heatmap visualizes (${elapsed_time_patch} /
${elapsed_time_master}) * 100. 100 > (red) means slower and
100 < (blue) means faster.

It seems that they don't show any reproducible trends.

For example, the binary cases in mac-mini-result-2.pdf show
that patched cases are always slower but the binary cases in
mac-mini-result-{1,6}.pdf show that most patched cases are
faster. The binary cases in mac-mini-result-{1,5,6}.pdf show
that 0006 patch is slower than 0005 patch but the binary
cases in mac-mini-result-{3,4}.pdf don't show it.

Another example, the text cases in mac-mini-result-1.pdf
show that patched cases are always slower but the text cases
in mac-mini-result-1.pdf show that most patched cases are
faster.

I hope that these numbers help to proceed this proposal.

Thanks,
--
kou

Attachment Content-Type Size
unknown_filename text/plain 2.2 KB
mac-mini-result-1.csv text/csv 67.2 KB
mac-mini-result-1.pdf application/pdf 823.5 KB
mac-mini-result-2.csv text/csv 66.6 KB
mac-mini-result-2.pdf application/pdf 814.9 KB
mac-mini-result-3.csv text/csv 67.1 KB
mac-mini-result-3.pdf application/pdf 823.8 KB
mac-mini-result-4.csv text/csv 66.7 KB
mac-mini-result-4.pdf application/pdf 824.2 KB
mac-mini-result-5.csv text/csv 67.2 KB
mac-mini-result-5.pdf application/pdf 823.4 KB
mac-mini-result-6.csv text/csv 67.2 KB
mac-mini-result-6.pdf application/pdf 823.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Chavez 2025-12-02 02:46:18 [PATCH] Add hint for misspelled relations
Previous Message Richard Guo 2025-12-02 02:15:43 Re: apply_scanjoin_target_to_paths and partitionwise join