Re: Make COPY format extendable: Extract COPY TO format implementations

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Sutou Kouhei <kou(at)clear-code(dot)com>, andrew(at)dunslane(dot)net, zhjwpku(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-25 04:53:30
Message-ID: ZbHpSuvzyM9THZcl@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 25, 2024 at 01:36:03PM +0900, Masahiko Sawada wrote:
> Hmm I can see a similar trend that Suto-san had; the binary format got
> slightly faster whereas both text and csv format has small regression
> (4%~5%). I think that the improvement for binary came from the fact
> that we removed "if (cstate->opts.binary)" branches from the original
> CopyOneRowTo(). I've experimented with a similar optimization for csv
> and text format; have different callbacks for text and csv format and
> remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> for that. Here are results:
>
> HEAD w/ 0001 patch + remove branches:
> binary 2824.502 ms
> text 2715.264 ms
> csv 2803.381 ms
>
> The numbers look better now. I'm not sure these are within a noise
> range but it might be worth considering having different callbacks for
> text and csv formats.

Interesting.

Your numbers imply a 0.3% speedup for text, 0.7% speedup for csv and
0.9% speedup for binary, which may be around the noise range assuming
a ~1% range. While this does not imply a regression, that seems worth
the duplication IMO. The patch had better document the reason why the
split is done, as well.

CopyFromTextOneRow() has also specific branches for binary and
non-binary removed in 0005, so assuming that I/O is not a bottleneck,
the operation would be faster because we would not evaluate this "if"
condition for each row. Wouldn't we also see improvements for COPY
FROM with short row values, say when mounting PGDATA into a
tmpfs/ramfs?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2024-01-25 04:58:19 Re: Schema variables - new implementation for Postgres 15
Previous Message Nikolay Samokhvalov 2024-01-25 04:41:43 Re: UUID v7