Re: [PATCH] Better Performance for PostgreSQL with large INSERTs

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Filip Janus <fjanus(at)redhat(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Philipp Marek <philipp(at)marek(dot)priv(dot)at>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Better Performance for PostgreSQL with large INSERTs
Date: 2025-12-12 12:54:38
Message-ID: CAKZiRmzS+_Lap9dEQNn20SMAcOMrg3Wvo9yPeZZ1Cb0dqJeS=g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 26, 2025 at 3:03 PM Filip Janus <fjanus(at)redhat(dot)com> wrote:
>
>
>
> -Filip-
>
>
> út 7. 10. 2025 v 16:54 odesílatel Andres Freund <andres(at)anarazel(dot)de> napsal:
>>
>> Hi,
>>
>> On 2025-10-07 15:03:29 +0200, Philipp Marek wrote:
>> > > Have you tried to verify that this doesn't cause performance regressions
>> > > in
>> > > other workloads? pq_recvbuf() has this code:
>> > >
>> > ...
>> > >
>> > > I do seem to recall that just increasing the buffer size substantially
>> > > lead to
>> > > more time being spent inside that memmove() (likely due to exceeding
>> > > L1/L2).
>> >
>> >
>> > Do you have any pointers to discussions or other data about that?
>> >
>> >
>> > My (quick) analysis was that clients that send one request,
>> > wait for an answer, then send the next request wouldn't run that code
>> > as there's nothing behind the individual requests that could be moved.
>> >
>> >
>> > But yes, Pipeline Mode[1] might/would be affected.
>> >
>> > The interesting question is how much data can userspace copy before
>> > that means more load than doing a userspace-kernel-userspace round trip.
>> > (I guess that moving 64kB or 128kB should be quicker, especially since
>> > the various CPU mitigations.)
>>
>> I unfortunately don't remember the details of where I saw it
>> happening. Unfortunately I suspect it'll depend a lot on hardware and
>> operating system details (like the security mitigations you mention) when it
>> matters too.
>>
>>
>> > As long as there are complete requests in the buffer the memmove()
>> > could be avoided; only the initial part of the first incomplete request
>> > might need moving to the beginning.
>>
>> Right. I'd be inclined that that ought to be addressed as part of this patch,
>> that way we can be sure that it's pretty sure it's not going to cause
>> regressions.
>
>
> I tried to benchmark the usage of memmove(), but I wasn’t able to hit the memmove() part of the code. This led me to a deeper investigation, and I realized that the memmove() call is probably in a dead part of the code.
> pq_recvbuf is called when PqRecvPointer >= PqRecvLength, while memmove() is called later only if PqRecvLength > PqRecvPointer.
> This results in a contradiction.
>
>>
>> > The documentation says
>> >
>> > > Pipelining is less useful, and more complex,
>> > > when a single pipeline contains multiple transactions
>> > > (see Section 32.5.1.3).
>> >
>> > are there any benchmarks/usage statistics for pipeline mode?
>>
>> You can write benchmarks for it using pgbench's pipeline support, with a
>> custom script.
>>
>> Greetings,
>>
>> Andres Freund
>>
> I am also proposing the introduction of a new GUC variable for setting PQ_RECV_BUFFER_SIZE in the first patch. And the second patch removes the dead code.
>

Hi Filip,

Can you please how have you verified it is giving you that some perf. increase?

3 tries each, best:

@ pq_recv_buffers = 2MB best of 3:
latency average = 2.594 ms
latency stddev = 0.352 ms
initial connection time = 9.419 ms
tps = 385.431723 (without initial connection time)

@ pq_recv_buffers = default (8kB) best of 3:
latency average = 2.629 ms
latency stddev = 0.929 ms
initial connection time = 9.937 ms
tps = 380.336257 (without initial connection time)

/usr/pgsql19/bin/pgbench -h xxx -U app -f insert.sql -c 1 -P 1 -n -T
5 -M prepared postgres
where insert.sql was:
echo "CREATE TEMPORARY TABLE IF NOT EXISTS file_storage (data
BYTEA STORAGE EXTERNAL) ON COMMIT PRESERVE ROWS;" > insert.sql
echo "INSERT INTO file_storage(data) VALUES ('" >> insert.sql
perl -e 'print "A"x(1024*1024);' >> insert.sql # 1MB
echo "');" >> insert.sql

Some description of the env I had:
- tuned TCP rmem/wmem and set congestion to BBR (to eliminate TCP as
being a bottlneck)
- low RTT (same AWS zone), max 25Gbps total, max 9.6 Gbps single stream TCP
- as TOAST pglz compression way taking most of CPU in my case , I've
changed it to lz4 also didn't help a lot, so I've changed it to avoid
*any* compression
- switched to temporary table to avoid I/O as much as possible,
wal_level=minimal too
- had to use prepared statements as otherwise I was hitting way too
much CPU in parser (yylex routines)

So I'm looking for a way to demonstrate the effect. I've also written
a simple psypong2 based LO upload benchmark as pgbench cannot
apparently benchmark this. Sadly of course, then you cannot I think
disable compression and/or load into TEMPORARY table so it's far worse
and hits I/O heavy (as it hit pg_largeobjects*)

-J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mircea Cadariu 2025-12-12 12:55:22 Re: Metadata and record block access stats for indexes
Previous Message Heikki Linnakangas 2025-12-12 12:22:38 Re: Improve the performance of Unicode Normalization Forms.