Re: Syncrep and improving latency due to WAL throttling

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Syncrep and improving latency due to WAL throttling
Date: 2023-01-26 13:40:56
Message-ID: CAKZiRmwfUocPjbVKynYZoanxAqhZwUBVJwpEtZR9WXKb_ptTag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 1/25/23 20:05, Andres Freund wrote:
> > Hi,
> >
> > Such a feature could be useful - but I don't think the current place of
> > throttling has any hope of working reliably:
[..]
> > You're blocking in the middle of an XLOG insertion.
[..]
> Yeah, I agree the sleep would have to happen elsewhere.

Fixed.

> > My best idea for how to implement this in a somewhat safe way would be for
> > XLogInsertRecord() to set a flag indicating that we should throttle, and set
> > InterruptPending = true. Then the next CHECK_FOR_INTERRUPTS that's allowed to
> > proceed (i.e. we'll not be in a critical / interrupts off section) can
> > actually perform the delay. That should fix the hard deadlock danger and
> > remove most of the increase in lock contention.
> >
>
> The solution I've imagined is something like autovacuum throttling - do
> some accounting of how much "WAL bandwidth" each process consumed, and
> then do the delay/sleep in a suitable place.
>

By the time you replied I've already tried what Andres has recommended.

[..]
>> At the very least this'd
> > have to flush only up to the last fully filled page.
> >
> Same for the flushes of partially flushed pages - if there's enough of
> small OLTP transactions, we're already having this issue. I don't see
> why would this make it measurably worse. But if needed, we can simply
> backoff to the last page boundary, so that we only flush the complete
> page. That would work too - the goal is not to flush everything, but to
> reduce how much of the lag is due to the current process (i.e. to wait
> for sync replica to catch up).

I've introduced the rounding to the last written page (hopefully).

> > Just counting the number of bytes inserted by a backend will make the overhead
> > even worse, as the flush will be triggered even for OLTP sessions doing tiny
> > transactions, even though they don't contribute to the problem you're trying
> > to address. How about counting how many bytes of WAL a backend has inserted
> > since the last time that backend did an XLogFlush()?
> >
>
> No, we should reset the counter at commit, so small OLTP transactions
> should not reach really trigger this. That's kinda the point, to only
> delay "large" transactions producing a lot of WAL.

Fixed.

> > I also suspect the overhead will be more manageable if you were to force a
> > flush only up to a point further back than the last fully filled page. We
> > don't want to end up flushing WAL for every page, but if you just have a
> > backend-local accounting mechanism, I think that's inevitably what you'd end
> > up with when you have a large number of sessions. But if you'd limit the
> > flushing to be done to synchronous_commit_flush_wal_after / 2 boundaries, and
> > only ever to a prior boundary, the amount of unnecessary WAL flushes would be
> > proportional to synchronous_commit_flush_wal_after.
> >
>
> True, that's kinda what I suggested above as a solution for partially
> filled WAL pages.
>
> I agree this is crude and we'd probably need some sort of "balancing"
> logic that distributes WAL bandwidth between backends, and throttles
> backends producing a lot of WAL.

OK - that's not included (yet?), as it would make this much more complex.

In summary: Attached is a slightly reworked version of this patch.
1. Moved logic outside XLogInsertRecord() under ProcessInterrupts()
2. Flushes up to the last page boundary, still uses SyncRepWaitForLSN()
3. Removed GUC for now (always on->256kB); potentially to be restored
4. Resets backendWal counter on commits

It's still crude, but first tests indicate that it behaves similarly
(to the initial one with GUC = 256kB).

Also from the Bharath email, I've found another patch proposal by
Simon [1] and I would like to avoid opening the Pandora's box again,
but to stress this: this feature is specifically aimed at solving OLTP
latency on *sync*rep (somewhat some code could be added/generalized
later on and this feature could be extended to async case, but this
then opens the question of managing the possible WAL throughput
budget/back throttling - this was also raised in first thread here [2]
by Konstantin).

IMHO it could implement various strategies under user-settable GUC
"wal_throttle_larger_transactions=(sync,256kB)" or
"wal_throttle_larger_transactions=off" , and someday later someone
could teach the code the async case (let's say under
"wal_throttle_larger_transactions=(asyncMaxRPO, maxAllowedLag=8MB,
256kB)"). Thoughts?

[1] - https://www.postgresql.org/message-id/flat/CA%2BU5nMLfxBgHQ1VLSeBHYEMjHXz_OHSkuFdU6_1quiGM0RNKEg%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/71f3e6fb-2fca-a798-856a-f23c8ede2333%40garret.ru

Attachment Content-Type Size
0001-WIP-Syncrep-and-improving-latency-due-to-WAL-throttl.patch application/octet-stream 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-01-26 13:41:06 Re: New strategies for freezing, advancing relfrozenxid early
Previous Message Dimos Stamatakis 2023-01-26 13:10:36 Re: pg_upgrade from PG-14.5 to PG-15.1 failing due to non-existing function