Quick Links

Re: Flushing large data immediately in pqcomm

From:	Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	Re: Flushing large data immediately in pqcomm
Date:	2024-03-14 11:22:21
Message-ID:	CAGPVpCSX8bTF61ZL9jOgh1AaY3bgsWnQ6J7WmJK4TV0f2LPnJQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi hackers,

I did some experiments with this patch, after previous discussions. This
probably does not answer all questions, but would be happy to do more if
needed.

First, I updated the patch according to what suggested here [1]. PSA v2.
I tweaked the master branch a bit to not allow any buffering. I compared
HEAD, this patch and no buffering at all.
I also added a simple GUC to control PqSendBufferSize, this change only
allows to modify the buffer size and should not have any impact on
performance.

I again ran the COPY TO STDOUT command and timed it. AFAIU COPY sends data
row by row, and I tried running the command under different scenarios with
different # of rows and row sizes. You can find the test script attached
(see test.sh).
All timings are in ms.

1- row size = 100 bytes, # of rows = 1000000
┌───────────┬────────────┬──────┬──────┬──────┬──────┬──────┐
│ │ 1400 bytes │ 2KB │ 4KB │ 8KB │ 16KB │ 32KB │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ HEAD │ 1036 │ 998 │ 940 │ 910 │ 894 │ 874 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ patch │ 1107 │ 1032 │ 980 │ 957 │ 917 │ 909 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ no buffer │ 6230 │ 6125 │ 6282 │ 6279 │ 6255 │ 6221 │
└───────────┴────────────┴──────┴──────┴──────┴──────┴──────┘

2- row size = half of the rows are 1KB and rest is 10KB , # of rows =
1000000
┌───────────┬────────────┬───────┬───────┬───────┬───────┬───────┐
│ │ 1400 bytes │ 2KB │ 4KB │ 8KB │ 16KB │ 32KB │
├───────────┼────────────┼───────┼───────┼───────┼───────┼───────┤
│ HEAD │ 25197 │ 23414 │ 20612 │ 19206 │ 18334 │ 18033 │
├───────────┼────────────┼───────┼───────┼───────┼───────┼───────┤
│ patch │ 19843 │ 19889 │ 19798 │ 19129 │ 18578 │ 18260 │
├───────────┼────────────┼───────┼───────┼───────┼───────┼───────┤
│ no buffer │ 23752 │ 23565 │ 23602 │ 23622 │ 23541 │ 23599 │
└───────────┴────────────┴───────┴───────┴───────┴───────┴───────┘

3- row size = half of the rows are 1KB and rest is 1MB , # of rows = 1000
┌───────────┬────────────┬──────┬──────┬──────┬──────┬──────┐
│ │ 1400 bytes │ 2KB │ 4KB │ 8KB │ 16KB │ 32KB │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ HEAD │ 3137 │ 2937 │ 2687 │ 2551 │ 2456 │ 2465 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ patch │ 2399 │ 2390 │ 2402 │ 2415 │ 2417 │ 2422 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ no buffer │ 2417 │ 2414 │ 2429 │ 2418 │ 2435 │ 2404 │
└───────────┴────────────┴──────┴──────┴──────┴──────┴──────┘

3- row size = all rows are 1MB , # of rows = 1000
┌───────────┬────────────┬──────┬──────┬──────┬──────┬──────┐
│ │ 1400 bytes │ 2KB │ 4KB │ 8KB │ 16KB │ 32KB │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ HEAD │ 6113 │ 5764 │ 5281 │ 5009 │ 4885 │ 4872 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ patch │ 4759 │ 4754 │ 4754 │ 4758 │ 4782 │ 4805 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ no buffer │ 4756 │ 4774 │ 4793 │ 4766 │ 4770 │ 4774 │
└───────────┴────────────┴──────┴──────┴──────┴──────┴──────┘

Some quick observations:
1- Even though I expect both the patch and HEAD behave similarly in case of
small data (case 1: 100 bytes), the patch runs slightly slower than HEAD.
2- In cases where the data does not fit into the buffer, the patch starts
performing better than HEAD. For example, in case 2, patch seems faster
until the buffer size exceeds the data length. When the buffer size is set
to something larger than 10KB (16KB/32KB in this case), there is again a
slight performance loss with the patch as in case 1.
3- With large row sizes (i.e. sizes that do not fit into the buffer) not
buffering at all starts performing better than HEAD. Similarly the patch
performs better too as it stops buffering if data does not fit into the
buffer.

[1]
https://www.postgresql.org/message-id/CAGECzQTYUhnC1bO%3DzNiSpUgCs%3DhCYxVHvLD2doXNx3My6ZAC2w%40mail.gmail.com

Thanks,
--
Melih Mutlu
Microsoft

Attachment	Content-Type	Size
add-pq_send_buffer_size-GUC.patch	application/x-patch	1.9 KB
test.sh	text/x-sh	1.9 KB
v2-0001-Flush-large-data-immediately-in-pqcomm.patch	application/octet-stream	3.8 KB

In response to

Re: Flushing large data immediately in pqcomm at 2024-02-02 22:38:27 from Andres Freund

Responses

Re: Flushing large data immediately in pqcomm at 2024-03-14 12:12:36 from Robert Haas
Re: Flushing large data immediately in pqcomm at 2024-03-14 12:30:19 from Jelte Fennema-Nio
Re: Flushing large data immediately in pqcomm at 2024-03-14 12:45:59 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dagfinn Ilmari Mannsåker	2024-03-14 11:25:30	Re: Using the %m printf format more
Previous Message	Peter Eisentraut	2024-03-14 11:07:56	Re: UUID v7