BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: rony(dot)kurniawan(at)oracle(dot)com
Subject: BUG #17005: Enhancement request: Improve walsender throughput by aggregating multiple messages in one send
Date: 2021-05-13 00:31:53
Message-ID: 17005-3e1030784d5440c4@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17005
Logged by: Rony Kurniawan
Email address: rony(dot)kurniawan(at)oracle(dot)com
PostgreSQL version: 11.7
Operating system: Oracle Linux Server release 7.9
Description:

Hi,

I measured the throughput of reading the logical replication slot and found
that in smaller row size (512 bytes) the throughput is 50% lower compared to
1024 bytes.

tcpdump shows that ethernet packets sent by the replication server contain
only one message per packet (see tcpdump output below).
May be this is the intended design to achieve low latency but this is not
favorable in application that requires high throughput.

Is it possible for PostgreSQL to enable Nagle's algorithm on the streaming
socket for replication?
Or aggegate the messages manually before sending them in one send()?

Thank you,
Rony

test case:
client and server are on different machines or run the server in a docker.
create table public.test (id integer generated always as identity, name
varchar(512));
alter table public.test replica identity full;
select * from pg_create_logical_replication_slot('testslot',
'test_decoding');
insert into public.test (name) values (rpad('a', 512, 'a'));
...
insert into public.test (name) values (rpad('a', 512, 'a'));

I used pgbench to insert million of records to the test table to measure the
throughput, but one insert is enough to show how the server send the
message.

client terminal 1:
$ sudo tcpdump -D
1.enp0s3
2.virbr0
3.docker0

$ sudo tcpdump -i 3 -w psql.pcap "tcp port 5432"

client terminal 2:
$ pg_recvlogical --start --slot=testslot -d postgres -h 172.17.0.2 -U
postgres -f -

client terminal 1:
$ sudo tcpdump -i 3 -w psql.pcap "tcp port 5432"
ctrl-c
37 packets captured
37 packets received by filter
0 packets dropped by kernel

$ tcpdump --number -nn -A -r psql.pcap
...
22 16:38:37.217677 IP 172.17.0.1.56140 > 172.17.0.2.5432:
...START_REPLICATION SLOT "testslot" LOGICAL 0/0.
...
28 16:38:37.218209 IP 172.17.0.2.5432 > 172.17.0.1.56140: ...BEGIN
1888650
...
30 16:38:37.218332 IP 172.17.0.2.5432 > 172.17.0.1.56140: ...table
public.test: INSERT: id[integer]: 1 name[character
varying]:'aaa...512...aaa'
31 16:38:37.218345 IP 172.17.0.2.5432 > 172.17.0.1.56140: ...COMMIT
1888650

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-05-13 10:41:45 BUG #17006: Process watcher window doesnt appear
Previous Message Andres Freund 2021-05-12 16:13:54 Re: BUG #15990: PROCEDURE throws "SQL Error [XX000]: ERROR: no known snapshots" with PostGIS geometries