Re: Syncrep and improving latency due to WAL throttling

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Syncrep and improving latency due to WAL throttling
Date: 2023-02-02 11:12:44
Message-ID: CAKZiRmyR_OBZfvaG03piRxxg7XDC+dmGx50P6Pmn-tMBLLdhVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 2, 2023 at 11:03 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

> > I agree that some other concurrent backend's
> > COMMIT could fsync it, but I was wondering if that's sensible
> > optimization to perform (so that issue_fsync() would be called for
> > only commit/rollback records). I can imagine a scenario with 10 such
> > concurrent backends running - all of them with this $thread-GUC set -
> > but that would cause 20k unnecessary fsyncs (?) -- (assuming single
> > HDD with IOlat=20ms and standby capable of sync-ack < 0.1ms , that
> > would be wasted close to 400s just due to local fsyncs?). I don't have
> > a strong opinion or in-depth on this, but that smells like IO waste.
> >
>
> Not sure what optimization you mean,

Let me clarify, let's say something like below (on top of the v3) just
to save IOPS:

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2340,6 +2340,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli,
bool flexible)
if (sync_method != SYNC_METHOD_OPEN &&
sync_method != SYNC_METHOD_OPEN_DSYNC)
{
+ bool openedLogFile = false;
if (openLogFile >= 0 &&
!XLByteInPrevSeg(LogwrtResult.Write,
openLogSegNo,

wal_segment_size))
@@ -2351,9 +2352,15 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID
tli, bool flexible)
openLogTLI = tli;
openLogFile = XLogFileOpen(openLogSegNo, tli);
ReserveExternalFD();
+ openedLogFile = true;
}

- issue_xlog_fsync(openLogFile, openLogSegNo, tli);
+ /* can we bypass fsyncing() XLOG from the backend if
+ * we have been called without commit request?
+ * usually the feature will be off here
(XLogDelayPending=false)
+ */
+ if(openedLogFile == true || XLogDelayPending == false)
+ issue_xlog_fsync(openLogFile,
openLogSegNo, tli);
}

+ maybe some additional logic to ensure that this micro-optimization
for saving IOPS would be not enabled if the backend is calling that
XLogFlush/Write() for actual COMMIT record

> But I think the backends still have to sleep at some point, so that they
> don't queue too much unflushed WAL - that's kinda the whole point, no?

Yes, but it can be flushed to standby, flushed locally but not fsynced
locally (?) - provided that it was not COMMIT - I'm just wondering
whether it makes sense (Question 1)

> The issue is more about triggering the throttling too early, before we
> hit the bandwidth limit. Which happens simply because we don't have a
> very good way to decide whether the latency is growing, so the patch
> just throttles everything.

Maximum TCP bandwidth limit seems to be fluctuating in the real world
I suppose, so it couldn't be a hard limit. On the other hand I can
imagine operators setting
"throttle-those-backends-if-global-WALlatencyORrate>XXX"
(administrative decision). That would be cool to have but yes it would
require WAL latency and rate measurement first (on its own that would
make a very nice addition to the pg_stat_replication). But one thing
to note would be that there could be many potential latencies (& WAL
throughput rates) to consider (e.g. quorum of 3 standby sync having
different latencies) - which one to choose?

(Question 2) I think we have reached simply a decision point on
whether the WIP/PoC is good enough as it is (like Andres wanted and
you +1 to this) or it should work as you propose or maybe keep it as
an idea for the future?

-J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-02-02 11:21:14 Re: Deadlock between logrep apply worker and tablesync worker
Previous Message Alvaro Herrera 2023-02-02 10:58:59 Re: pg_dump versus hash partitioning