Re: Syncrep and improving latency due to WAL throttling

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Syncrep and improving latency due to WAL throttling
Date: 2023-01-27 21:19:27
Message-ID: 20230127211927.hgx6bnhcyepjmqa6@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-27 12:06:49 +0100, Jakub Wartak wrote:
> On Thu, Jan 26, 2023 at 4:49 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > Huh? Why did you remove the GUC?
>
> After reading previous threads, my optimism level of getting it ever
> in shape of being widely accepted degraded significantly (mainly due
> to the discussion of wider category of 'WAL I/O throttling' especially
> in async case, RPO targets in async case and potentially calculating
> global bandwidth).

I think it's quite reasonable to limit this to a smaller scope. Particularly
because those other goals are pretty vague but ambitious goals. IMO the
problem with a lot of the threads is precisely that that they aimed at a level
of generallity that isn't achievable in one step.

> I've assumed that it is a working sketch, and as such not having GUC name
> right now (just for sync case) would still allow covering various other
> async cases in future without breaking compatibility with potential name GUC
> changes (see my previous "wal_throttle_larger_transactions=<strategies>"
> proposal ).

It's harder to benchmark something like this without a GUC, so I think it's
worth having, even if it's not the final name.

> > SyncRepWaitForLSN() has this comment:
> > * 'lsn' represents the LSN to wait for. 'commit' indicates whether this LSN
> > * represents a commit record. If it doesn't, then we wait only for the WAL
> > * to be flushed if synchronous_commit is set to the higher level of
> > * remote_apply, because only commit records provide apply feedback.
>
> Hm, not sure if I understand: are you saying that we should (in the
> throttled scenario) have some special feedback msgs or not --
> irrespective of the setting? To be honest the throttling shouldn't
> wait for the standby full setting, it's just about slowdown fact (so
> IMHO it would be fine even in remote_write/remote_apply scenario if
> the remote walreceiver just received the data, not necessarily write
> it into file or wait for for applying it). Just this waiting for a
> round-trip ack about LSN progress would be enough to slow down the
> writer (?). I've added some timing log into the draft and it shows
> more or less constantly solid RTT even as it stands:

My problem was that the function header for SyncRepWaitForLSN() seems to say
that we don't wait at all if commit=false and synchronous_commit <
remote_apply. But I think that might just be bad formulation.

[...] 'commit' indicates whether this LSN
* represents a commit record. If it doesn't, then we wait only for the WAL
* to be flushed if synchronous_commit is set to the higher level of
* remote_apply, because only commit records provide apply feedback.

because the code does something entirely different afaics:

/* Cap the level for anything other than commit to remote flush only. */
if (commit)
mode = SyncRepWaitMode;
else
mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-27 21:20:39 Re: Syncrep and improving latency due to WAL throttling
Previous Message Andres Freund 2023-01-27 21:09:11 Re: Non-superuser subscription owners