From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Sutou Kouhei <kou(at)clear-code(dot)com>, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2024-02-06 01:41:25 |
Message-ID: | 20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2024-02-06 10:01:36 +0900, Michael Paquier wrote:
> On Mon, Feb 05, 2024 at 10:21:18AM -0800, Andres Freund wrote:
> > Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all
> > be surprised if it lead to a measurable performance regression.
>
> Yes, I was looking at runtimes and some profiles around CopyOneRowTo()
> to see the effects that this has yesterday. The principal point of
> contention is CopyOneRowTo() where the callback is called once per
> attribute, so more attributes stress it more.
Right.
> If you have concerns about that, I'm OK to revert, I'm not wedded to
> this level of control. Note that I've actually seen *better*
> runtimes.
I'm somewhat worried that handling the different formats at that level will
make it harder to improve copy performance - it's quite attrociously slow
right now. The more we reduce the per-row/field overhead, the more the
dispatch overhead will matter.
> [1]: https://www.postgresql.org/message-id/Zbr6piWuVHDtFFOl@paquier.xyz
>
> > I think callbacks for individual attributes is the wrong approach - the
> > dispatch needs to happen at a higher level, otherwise there are too many
> > indirect function calls.
>
> Hmm. Do you have concerns about v13 posted on [2] then?
As is I'm indeed not a fan. It imo doesn't make sense to have an indirect
dispatch for *both* ->copy_attribute_out *and* ->CopyToOneRow. After all, when
in ->CopyToOneRow for text, we could know that we need to call
CopyAttributeOutText etc.
> If yes, then I'd assume that this shuts down the whole thread or that it
> needs a completely different approach, because we will multiply indirect
> function calls that can control how data is generated for each row, which is
> the original case that Sutou-san wanted to tackle.
I think it could be rescued fairly easily - remove the dispatch via
->copy_attribute_out(). To avoid duplicating code you could use a static
inline function that's used with constant arguments by both csv and text mode.
I think it might also be worth ensuring that future patches can move branches
like
if (cstate->encoding_embeds_ascii)
if (cstate->need_transcoding)
into the choice of per-row callback.
> The End, Start and In/OutFunc callbacks are called only once per query, so
> these don't matter AFAIU.
Right.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2024-02-06 02:05:04 | confusing / inefficient "need_transcoding" handling in copy |
Previous Message | Michael Paquier | 2024-02-06 01:01:36 | Re: Make COPY format extendable: Extract COPY TO format implementations |