Re: Syncrep and improving latency due to WAL throttling

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Syncrep and improving latency due to WAL throttling
Date: 2023-01-27 20:45:16
Message-ID: 8cc3b0c3-9e78-32a1-fa32-9c1575a1ca3a@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/27/23 08:18, Bharath Rupireddy wrote:
> On Thu, Jan 26, 2023 at 9:21 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>>> 7. I think we need to not let backends throttle too frequently even
>>> though they have crossed wal_throttle_threshold bytes. The best way is
>>> to rely on replication lag, after all the goal of this feature is to
>>> keep replication lag under check - say, throttle only when
>>> wal_distance > wal_throttle_threshold && replication_lag >
>>> wal_throttle_replication_lag_threshold.
>>
>> I think my idea of only forcing to flush/wait an LSN some distance in the past
>> would automatically achieve that?
>
> I'm sorry, I couldn't get your point, can you please explain it a bit more?
>

The idea is that we would not flush the exact current LSN, because
that's likely somewhere in the page, and we always write the whole page
which leads to write amplification.

But if we backed off a bit, and wrote e.g. to the last page boundary,
that wouldn't have this issue (either the page was already flushed -
noop, or we'd have to flush it anyway).

We could even back off a bit more, to increase the probability it was
actually flushed / sent to standby. That would still work, because the
whole point is not to allow one process to generate too much unflushed
WAL, forcing the other (small) xacts to wait at commit.

Imagine we have the limit set to 8MB, i.e. the backend flushes WAL after
generating 8MB of WAL. If we flush to the exact current LSN, the other
backends will wait for ~4MB on average. If we back off to 1MB, the wait
average increases to ~4.5MB. (This is simplified, as it ignores WAL from
the small xacts. But those flush regularly, which limit the amount. It
also ignores there might be multiple large xacts.)

> Looking at the patch, the feature, in its current shape, focuses on
> improving replication lag (by throttling WAL on the primary) only when
> synchronous replication is enabled. Why is that? Why can't we design
> it for replication in general (async, sync, and logical replication)?
>

This focuses on sync rep, because that's where the commit latency comes
from. Async doesn't have that issue, because it doesn't wait for the
standby.

In particular, the trick is in penalizing the backends generating a lot
of WAL, while leaving the small xacts alone.

> Keeping replication lag under check enables one to provide a better
> RPO guarantee as discussed in the other thread
> https://www.postgresql.org/message-id/CAHg%2BQDcO_zhgBCMn5SosvhuuCoJ1vKmLjnVuqUEOd4S73B1urw%40mail.gmail.com.
>

Isn't that a bit over-complicated? RPO generally only cares about xacts
that committed (because that's what you want to not lose), so why not to
simply introduce a "sync mode" that simply uses a bit older LSN when
waiting for the replica? Seems much simpler and similar to what we
already do.

Yeah, if someone generates a lot of WAL in uncommitted transaction, all
of that may be lost. But who cares (from the RPO point of view)?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-01-27 21:08:38 Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
Previous Message Jeff Davis 2023-01-27 20:34:13 Re: GUCs to control abbreviated sort keys