Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Sutou Kouhei <kou(at)clear-code(dot)com>, andrew(at)dunslane(dot)net, zhjwpku(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-25 05:28:38
Message-ID: CAD21AoAkSzEbkUp8fg1pzCFDFiZNPWXM+=rNWXhgC8TYp88Uvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 25, 2024 at 1:53 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Jan 25, 2024 at 01:36:03PM +0900, Masahiko Sawada wrote:
> > Hmm I can see a similar trend that Suto-san had; the binary format got
> > slightly faster whereas both text and csv format has small regression
> > (4%~5%). I think that the improvement for binary came from the fact
> > that we removed "if (cstate->opts.binary)" branches from the original
> > CopyOneRowTo(). I've experimented with a similar optimization for csv
> > and text format; have different callbacks for text and csv format and
> > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> > for that. Here are results:
> >
> > HEAD w/ 0001 patch + remove branches:
> > binary 2824.502 ms
> > text 2715.264 ms
> > csv 2803.381 ms
> >
> > The numbers look better now. I'm not sure these are within a noise
> > range but it might be worth considering having different callbacks for
> > text and csv formats.
>
> Interesting.
>
> Your numbers imply a 0.3% speedup for text, 0.7% speedup for csv and
> 0.9% speedup for binary, which may be around the noise range assuming
> a ~1% range. While this does not imply a regression, that seems worth
> the duplication IMO.

Agreed. In addition to that, now that each format routine has its own
callbacks, there would be chances that we can do other optimizations
dedicated to the format type in the future if available.

> The patch had better document the reason why the
> split is done, as well.

+1

>
> CopyFromTextOneRow() has also specific branches for binary and
> non-binary removed in 0005, so assuming that I/O is not a bottleneck,
> the operation would be faster because we would not evaluate this "if"
> condition for each row. Wouldn't we also see improvements for COPY
> FROM with short row values, say when mounting PGDATA into a
> tmpfs/ramfs?

Probably. Seems worth evaluating.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shankaran, Akash 2024-01-25 05:43:41 RE: Popcount optimization using AVX512
Previous Message Peter Smith 2024-01-25 05:08:48 Re: Synchronizing slots from primary to standby