Re: Syncrep and improving latency due to WAL throttling

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Syncrep and improving latency due to WAL throttling
Date: 2023-01-25 19:05:35
Message-ID: 20230125190535.cjodxprxtnqyrqgz@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-25 14:32:51 +0100, Jakub Wartak wrote:
> In other words it allows slow down of any backend activity. Any feedback on
> such a feature is welcome, including better GUC name proposals ;) and
> conditions in which such feature should be disabled even if it would be
> enabled globally (right now only anti- wraparound VACUUM comes to mind, it's
> not in the patch).

Such a feature could be useful - but I don't think the current place of
throttling has any hope of working reliably:

> @@ -1021,6 +1025,21 @@ XLogInsertRecord(XLogRecData *rdata,
> pgWalUsage.wal_bytes += rechdr->xl_tot_len;
> pgWalUsage.wal_records++;
> pgWalUsage.wal_fpi += num_fpi;
> +
> + backendWalInserted += rechdr->xl_tot_len;
> +
> + if ((synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY || synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_WRITE) &&
> + synchronous_commit_flush_wal_after > 0 &&
> + backendWalInserted > synchronous_commit_flush_wal_after * 1024L)
> + {
> + elog(DEBUG3, "throttling WAL down on this session (backendWalInserted=%d)", backendWalInserted);
> + XLogFlush(EndPos);
> + /* XXX: refactor SyncRepWaitForLSN() to have different waitevent than default WAIT_EVENT_SYNC_REP */
> + /* maybe new WAIT_EVENT_SYNC_REP_BIG or something like that */
> + SyncRepWaitForLSN(EndPos, false);
> + elog(DEBUG3, "throttling WAL down on this session - end");
> + backendWalInserted = 0;
> + }
> }

You're blocking in the middle of an XLOG insertion. We will commonly hold
important buffer lwlocks, it'll often be in a critical section (no cancelling
/ terminating the session!). This'd entail loads of undetectable deadlocks
(i.e. hard hangs). And even leaving that aside, doing an unnecessary
XLogFlush() with important locks held will seriously increase contention.

My best idea for how to implement this in a somewhat safe way would be for
XLogInsertRecord() to set a flag indicating that we should throttle, and set
InterruptPending = true. Then the next CHECK_FOR_INTERRUPTS that's allowed to
proceed (i.e. we'll not be in a critical / interrupts off section) can
actually perform the delay. That should fix the hard deadlock danger and
remove most of the increase in lock contention.

I don't think doing an XLogFlush() of a record that we just wrote is a good
idea - that'll sometimes significantly increase the number of fdatasyncs/sec
we're doing. To make matters worse, this will often cause partially filled WAL
pages to be flushed out - rewriting a WAL page multiple times can
significantly increase overhead on the storage level. At the very least this'd
have to flush only up to the last fully filled page.

Just counting the number of bytes inserted by a backend will make the overhead
even worse, as the flush will be triggered even for OLTP sessions doing tiny
transactions, even though they don't contribute to the problem you're trying
to address. How about counting how many bytes of WAL a backend has inserted
since the last time that backend did an XLogFlush()?

A bulk writer won't do a lot of XLogFlush()es, so the time/bytes since the
last XLogFlush() will increase quickly, triggering a flush at the next
opportunity. But short OLTP transactions will do XLogFlush()es at least at
every commit.

I also suspect the overhead will be more manageable if you were to force a
flush only up to a point further back than the last fully filled page. We
don't want to end up flushing WAL for every page, but if you just have a
backend-local accounting mechanism, I think that's inevitably what you'd end
up with when you have a large number of sessions. But if you'd limit the
flushing to be done to synchronous_commit_flush_wal_after / 2 boundaries, and
only ever to a prior boundary, the amount of unnecessary WAL flushes would be
proportional to synchronous_commit_flush_wal_after.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-01-25 19:05:39 Re: pgsql: Rename contrib module basic_archive to basic_wal_module
Previous Message Aleksander Alekseev 2023-01-25 19:00:50 Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent