Re: Performance degradation on concurrent COPY into a single relation in PG16.

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance degradation on concurrent COPY into a single relation in PG16.
Date: 2023-08-01 12:55:13
Message-ID: CAApHDvo3ARuEYGk1hMxD+2oOjUFADdxrfWfF17BRWDGugeHqJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 1 Aug 2023 at 13:25, Andres Freund <andres(at)anarazel(dot)de> wrote:
> There's a lot of larger numbers in the file, which likely reduces the impact
> some. And there's the overhead of actually inserting the rows into the table,
> making the difference appear smaller than it is.

It might be worth special casing the first digit as we can save doing
the multiplication by 10 and the overflow checks on the first digit.
That should make it slightly faster to parse smaller numbers.

> COPY (SELECT generate_series(1, 1000) a, 10 b, 20 c, 30 d, 40 e, 50 f FROM generate_series(1, 10000)) TO '/tmp/lotsaints_wide.copy';
>
> psql -c 'DROP TABLE IF EXISTS lotsaints_wide; CREATE UNLOGGED TABLE lotsaints_wide(a int, b int, c int, d int, e int, f int);' && \
> pgbench -n -P1 -f <( echo "COPY lotsaints_wide FROM '/tmp/lotsaints_wide.copy' WHERE false") -t 5
>
> 15: 2992.605
> HEAD: 3325.201
> fastpath1.patch 2932.606
> fastpath2.patch 2783.915
>
> Interestingly fastpath1 is slower now, even though it wasn't with earlier
> patches (which still is repeatable). I do not have the foggiest as to why.

I'm glad to see that.

I've adjusted the patch to add the fast path for the 16 and 64-bit
versions of the function. I also added the special case for
processing the first digit, which looks like:

/* process the first digit */
digit = (*ptr - '0');

if (likely(digit < 10))
{
ptr++;
tmp = digit;
}

/* process remaining digits */
for (;;)

I tried adding the "at least 1 digit check" by adding an else { goto
slow; } in the above code, but it seems to generate slower code than
just checking if (unlikely(ptr == s)) { goto slow; } after the loop.

I also noticed that I wasn't getting the same performance after
adjusting the 16 and 64 bit versions. I assume that's down to code
alignment, but unsure of that. I ended up adjusting all the "while
(*ptr)" loops into "for (;;)" loops
since the NUL char check is handled by the "else break;". I also
removed the needless NUL char check in the isspace loops. It can't be
isspace and '\0'. I also replaced the isdigit() function call and
replaced it for manually checking the digit range. I see my compiler
(gcc12.2) effectively generates the same code as the unsigned char
fast path version checking if (digit < 10). Once I did that, I got the
performance back again.

With your new test with the small-sized ints, I get:

REL_15_STABLE:
latency average = 1696.390 ms

master @ d3a38318a
latency average = 1928.803 ms

master + fastpath1.patch:
latency average = 1634.736 ms

master + fastpath2.patch:
latency average = 1628.704 ms

master + fastpath3.patch
latency average = 1590.417 ms

I see no particular reason not to go ahead with the attached patch and
get this thread closed off. Any objections?

David

Attachment Content-Type Size
pg_strtoint_fastpath3.patch application/octet-stream 9.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-08-01 13:14:54 Re: Incorrect handling of OOM in WAL replay leading to data loss
Previous Message David Rowley 2023-08-01 12:21:42 Re: Performance degradation on concurrent COPY into a single relation in PG16.