Should wal receiver reply to wal sender more aggressively?

From: Paul Guo <paulguo(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Should wal receiver reply to wal sender more aggressively?
Date: 2021-06-15 15:39:59
Message-ID: CABQrizdE-31=X2yiS=91g5f-hdBSpAmYvc-D3M8zF=5iwedagg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While working on some related issues I found that the wal receiver
tries to call walrcv_receive() loop
before replying the write/flush/apply LSN to wal senders in
XLogWalRcvSendReply(). It is possible
that walrcv_receive() loop receives and writes a lot of xlogs, so it
does not reply those LSN
information in time, thus finally slows down those transactions due to
syncrep wait (assuming default synchronous_commit)

In my TPCB testing, I found the worst case is that 10,466,469 bytes
were consumed in the walrcv_receive() loop.

More seriously, we call XLogWalRcvSendReply(false, false) after
handling those bytes; The first
argument false means no force , i.e. it notifies unless max time of
guc wal_receiver_status_interval
value (10s by default) is reached, so we may have to wait other calls
of XLogWalRcvSendReply()
to notify the wal sender.

I thought and tried enhancing this by force-replying to the wal sender
each when receiving
a maximum bytes (e.g. 128K) but several things confused me:

- What's the purpose of guc wal_receiver_status_interval? The OS
kernel is usually not
efficient when handling small packets but we are not replying that
aggressively so why
is this guc there?

- I run simple TPCB (1000 scaling, 200 connections, shared_buffers,
max_connections tuned)
but found no obvious performance difference with and without the
code change. I did not
see obvious system (IO/CPU/network) bottleneck - probably the
bottleneck is in PG itself.
I did not investigate further at this moment, but the change should
in theory help, no?

Another thing came to my mind is the wal receiver logic:
Currently the wal receiver process does network io, wal write, wal
flush in one process.
Network io is async, blocking at epoll/poll, wal write is mostly
non-blocking, but for wal flush,
probably we could decouple it to a dedicated process. And maybe use
sync_file_range instead
of wal file fsync in issue_xlog_fsync()? We should sync those wal
contents with lower LSN at
first and reply to the wal sender in time, right?.

Below is the related code:

/* See if we can read data immediately */
len = walrcv_receive(wrconn, &buf, &wait_fd);
if (len != 0)
{
/*
* Process the received data,
and any subsequent data we
* can read without blocking.
*/
for (;;)
{
if (len > 0)
{
/*
* Something
was received from primary, so reset
* timeout
*/

last_recv_timestamp = GetCurrentTimestamp();
ping_sent = false;

XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
}
else if (len == 0)
break;
else if (len < 0)
{
ereport(LOG,

(errmsg("replication terminated by primary server"),

errdetail("End of WAL reached on timeline %u at %X/%X.",

startpointTLI,

LSN_FORMAT_ARGS(LogstreamResult.Write))));
endofwal = true;
break;
}
len =
walrcv_receive(wrconn, &buf, &wait_fd);
}

/* Let the primary know that
we received some data. */
XLogWalRcvSendReply(false, false);

/*
* If we've written some
records, flush them to disk and
* let the startup process and
primary server know about
* them.
*/
XLogWalRcvFlush(false);

--
Paul Guo (Vmware)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2021-06-15 16:06:50 Re: unnesting multirange data types
Previous Message Stephen Frost 2021-06-15 15:39:07 Re: Delegating superuser tasks to new security roles