Re: Performance degradation on concurrent COPY into a single relation in PG16.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance degradation on concurrent COPY into a single relation in PG16.
Date: 2023-07-11 18:51:59
Message-ID: 20230711185159.v2j5vnyrtodnwhgz@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-07-03 11:55:13 +0900, Masahiko Sawada wrote:
> While testing PG16, I observed that in PG16 there is a big performance
> degradation in concurrent COPY into a single relation with 2 - 16
> clients in my environment. I've attached a test script that measures
> the execution time of COPYing 5GB data in total to the single relation
> while changing the number of concurrent insertions, in PG16 and PG15.
> Here are the results on my environment (EC2 instance, RHEL 8.6, 128
> vCPUs, 512GB RAM):
>
> * PG15 (4b15868b69)
> PG15: nclients = 1, execution time = 14.181
>
> * PG16 (c24e9ef330)
> PG16: nclients = 1, execution time = 17.112

> The relevant commit is 00d1e02be2 "hio: Use ExtendBufferedRelBy() to
> extend tables more efficiently". With commit 1cbbee0338 (the previous
> commit of 00d1e02be2), I got a better numbers, it didn't have a better
> scalability, though:
>
> PG16: nclients = 1, execution time = 17.444

I think the single client case is indicative of an independent regression, or
rather regressions - it can't have anything to do with the fallocate() issue
and reproduces before that too in your numbers.

1) COPY got slower, due to:
9f8377f7a27 Add a DEFAULT option to COPY FROM

This added a new palloc()/free() to every call to NextCopyFrom(). It's not at
all clear to me why this needs to happen in NextCopyFrom(), particularly
because it's already stored in CopyFromState?

2) pg_strtoint32_safe() got substantially slower, mainly due
to
faff8f8e47f Allow underscores in integer and numeric constants.
6fcda9aba83 Non-decimal integer literals

pinned to one cpu, turbo mode disabled, I get the following best-of-three times for
copy test from '/tmp/tmp_4.data'
(too impatient to use the larger file every time)

15:
6281.107 ms

HEAD:
7000.469 ms

backing out 9f8377f7a27:
6433.516 ms

also backing out faff8f8e47f, 6fcda9aba83:
6235.453 ms

I suspect 1) can relatively easily be fixed properly. But 2) seems much
harder. The changes increased the number of branches substantially, that's
gonna cost in something as (previously) tight as pg_strtoint32().

For higher concurrency numbers, I now was able to reproduce the regression, to
a smaller degree. Much smaller after fixing the above. The reason we run into
the issue here is basically that the rows in the test are very narrow and reach

#define MAX_BUFFERED_TUPLES 1000

at a small number of pages, so we go back and forth between extending with
fallocate() and not.

I'm *not* saying that that is the solution, but after changing that to 5000,
the numbers look a lot better (with the other regressions "worked around"):

(this is again with turboboost disabled, to get more reproducible numbers)

clients 1 2 4 8 16 32

15,buffered=1000 25725 13211 9232 5639 4862 4700
15,buffered=5000 26107 14550 8644 6050 4943 4766
HEAD+fixes,buffered=1000 25875 14505 8200 4900 3565 3433
HEAD+fixes,buffered=5000 25830 12975 6527 3594 2739 2642

Greetings,

Andres Freund

[1] https://postgr.es/m/CAD21AoAEwHTLYhuQ6PaBRPXKWN-CgW9iw%2B4hm%3D2EOFXbJQ3tOg%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2023-07-11 19:01:10 Re: Document that server will start even if it's unable to open some TCP/IP ports
Previous Message Alvaro Herrera 2023-07-11 18:17:23 Re: tablecmds.c/MergeAttributes() cleanup