Re: Make COPY format extendable: Extract COPY TO format implementations

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Sutou Kouhei <kou(at)clear-code(dot)com>, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-02-06 01:41:25
Message-ID: 20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-02-06 10:01:36 +0900, Michael Paquier wrote:
> On Mon, Feb 05, 2024 at 10:21:18AM -0800, Andres Freund wrote:
> > Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all
> > be surprised if it lead to a measurable performance regression.
>
> Yes, I was looking at runtimes and some profiles around CopyOneRowTo()
> to see the effects that this has yesterday. The principal point of
> contention is CopyOneRowTo() where the callback is called once per
> attribute, so more attributes stress it more.

Right.

> If you have concerns about that, I'm OK to revert, I'm not wedded to
> this level of control. Note that I've actually seen *better*
> runtimes.

I'm somewhat worried that handling the different formats at that level will
make it harder to improve copy performance - it's quite attrociously slow
right now. The more we reduce the per-row/field overhead, the more the
dispatch overhead will matter.

> [1]: https://www.postgresql.org/message-id/Zbr6piWuVHDtFFOl@paquier.xyz
>
> > I think callbacks for individual attributes is the wrong approach - the
> > dispatch needs to happen at a higher level, otherwise there are too many
> > indirect function calls.
>
> Hmm. Do you have concerns about v13 posted on [2] then?

As is I'm indeed not a fan. It imo doesn't make sense to have an indirect
dispatch for *both* ->copy_attribute_out *and* ->CopyToOneRow. After all, when
in ->CopyToOneRow for text, we could know that we need to call
CopyAttributeOutText etc.

> If yes, then I'd assume that this shuts down the whole thread or that it
> needs a completely different approach, because we will multiply indirect
> function calls that can control how data is generated for each row, which is
> the original case that Sutou-san wanted to tackle.

I think it could be rescued fairly easily - remove the dispatch via
->copy_attribute_out(). To avoid duplicating code you could use a static
inline function that's used with constant arguments by both csv and text mode.

I think it might also be worth ensuring that future patches can move branches
like
if (cstate->encoding_embeds_ascii)
if (cstate->need_transcoding)
into the choice of per-row callback.

> The End, Start and In/OutFunc callbacks are called only once per query, so
> these don't matter AFAIU.

Right.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-02-06 02:05:04 confusing / inefficient "need_transcoding" handling in copy
Previous Message Michael Paquier 2024-02-06 01:01:36 Re: Make COPY format extendable: Extract COPY TO format implementations