Re: Performance degradation on concurrent COPY into a single relation in PG16.

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance degradation on concurrent COPY into a single relation in PG16.
Date: 2023-07-27 00:17:20
Message-ID: CAApHDvq78LRGVAO=EMbaJQ8bn1jBLQbC6w0JY=mMjnHAFxj_Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 2023-07-25 23:37:08 +1200, David Rowley wrote:
> > On Tue, 25 Jul 2023 at 17:34, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > HEAD: 812.690
> > >
> > > your patch: 821.354
> > >
> > > strtoint from 8692f6644e7: 824.543
> > >
> > > strtoint from 6b423ec677d^: 806.678
> >
> > I'm surprised to see the imul version is faster. It's certainly not
> > what we found when working on 6b423ec67.
>
> What CPUs did you test it on? I'd not be surprised if this were heavily
> dependent on the microarchitecture.

This was on AMD 3990x.

> One idea I had was to add a fastpath that won't parse all strings, but will
> parse the strings that we would generate, and fall back to the more general
> variant if it fails. See the attached, rough, prototype:

There were a couple of problems with fastpath.patch. You need to
reset the position of ptr at the start of the slow path and also you
were using tmp in the if (neg) part instead of tmp_s in the fast path
section.

I fixed that up and made two versions of the patch, one using the
overflow functions (pg_strtoint_fastpath1.patch) and one testing if
the number is going to overflow (same as current master)
(pg_strtoint_fastpath2.patch)

AMD 3990x:

master + fix_COPY_DEFAULT.patch:
latency average = 525.226 ms

master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath1.patch:
latency average = 488.171 ms

master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath2.patch:
latency average = 481.827 ms

Apple M2 Pro:

master + fix_COPY_DEFAULT.patch:
latency average = 348.433 ms

master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath1.patch:
latency average = 336.778 ms

master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath2.patch:
latency average = 335.992 ms

Zen 4 7945HX CPU:

master + fix_COPY_DEFAULT.patch:
latency average = 296.881 ms

master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath1.patch:
latency average = 287.052 ms

master + fix_COPY_DEFAULT.patch + pg_strtoint_fastpath2.patch:
latency average = 280.742 ms

The M2 chip does not seem to be clearly faster with the fastpath2
method of overflow checking, but both AMD CPUs seem pretty set on
fastpath2 being faster.

It would be really good if someone with another a newish intel CPU
could test this too.

David

Attachment Content-Type Size
pg_strtoint_fastpath1.patch text/plain 1.8 KB
pg_strtoint_fastpath2.patch text/plain 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-07-27 00:57:09 Re: Improve pg_stat_statements by making jumble handle savepoint names better
Previous Message Nathan Bossart 2023-07-27 00:05:05 Re: psql: Could we get "-- " prefixing on the **** QUERY **** outputs? (ECHO_HIDDEN)