Re: Make COPY format extendable: Extract COPY TO format implementations

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Sutou Kouhei <kou(at)clear-code(dot)com>, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-02-06 01:01:36
Message-ID: ZcGE8LrjGW8pmtOf@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 05, 2024 at 10:21:18AM -0800, Andres Freund wrote:
> Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all
> be surprised if it lead to a measurable performance regression.

Yes, I was looking at runtimes and some profiles around CopyOneRowTo()
to see the effects that this has yesterday. The principal point of
contention is CopyOneRowTo() where the callback is called once per
attribute, so more attributes stress it more. The method I've used is
described in [1], where I've used up to 50 int attributes (fixed value
size to limit appendBinaryStringInfo) with 5 million rows, with
shared_buffers large enough that all the data fits in it, while
prewarming the whole. Postgres runs on a tmpfs, and COPY TO is
redirected to /dev/null.

For reference, I still have some reports lying around (-g attached to
the backend process running the COPY TO queries with text format), so
here you go:
* At 95fb5b49024a:
- 83.04% 11.46% postgres postgres [.] CopyOneRowTo
- 71.58% CopyOneRowTo
- 30.37% OutputFunctionCall
+ 27.77% int4out
+ 13.18% CopyAttributeOutText
+ 10.19% appendBinaryStringInfo
3.76% 0xffffa7096234
2.78% 0xffffa7096214
+ 2.49% CopySendEndOfRow
1.21% int4out
0.83% memcpy(at)plt
0.76% 0xffffa7094ba8
0.75% 0xffffa7094ba4
0.69% pgstat_progress_update_param
0.57% enlargeStringInfo
0.52% 0xffffa7096204
0.52% 0xffffa7094b8c
+ 11.46% _start
* At 2889fd23be56:
- 83.53% 14.24% postgres postgres [.] CopyOneRowTo
- 69.29% CopyOneRowTo
- 29.89% OutputFunctionCall
+ 27.43% int4out
- 12.89% CopyAttributeOutText
pg_server_to_any
+ 9.31% appendBinaryStringInfo
3.68% 0xffffa6940234
+ 2.74% CopySendEndOfRow
2.43% 0xffffa6940214
1.36% int4out
0.74% 0xffffa693eba8
0.73% pgstat_progress_update_param
0.65% memcpy(at)plt
0.53% MemoryContextReset
+ 14.24% _start

If you have concerns about that, I'm OK to revert, I'm not wedded to
this level of control. Note that I've actually seen *better*
runtimes.

[1]: https://www.postgresql.org/message-id/Zbr6piWuVHDtFFOl@paquier.xyz

> I think callbacks for individual attributes is the wrong approach - the
> dispatch needs to happen at a higher level, otherwise there are too many
> indirect function calls.

Hmm. Do you have concerns about v13 posted on [2] then? If yes, then
I'd assume that this shuts down the whole thread or that it needs a
completely different approach, because we will multiply indirect
function calls that can control how data is generated for each row,
which is the original case that Sutou-san wanted to tackle. There
could be many indirect calls with custom callbacks that control how
things should be processed at row-level, and COPY likes doing work
with loads of data. The End, Start and In/OutFunc callbacks are
called only once per query, so these don't matter AFAIU.

[2]: https://www.postgresql.org/message-id/ZcFz59nJjQNjwgX0@paquier.xyz
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-02-06 01:41:25 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Kyotaro Horiguchi 2024-02-06 00:39:09 Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row