Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Date: 2021-12-23 00:23:27
Message-ID: CAHg+QDcO_zhgBCMn5SosvhuuCoJ1vKmLjnVuqUEOd4S73B1urw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers,

I am considering implementing RPO (recovery point objective) enforcement
feature for Postgres where the WAL writes on the primary are stalled when
the WAL distance between the primary and standby exceeds the configured
(replica_lag_in_bytes) threshold. This feature is useful particularly in
the disaster recovery setups where primary and standby are in different
regions and synchronous replication can't be set up for latency and
performance reasons yet requires some level of RPO enforcement.

The idea here is to calculate the lag between the primary and the standby
(Async?) server during XLogInsert and block the caller until the lag is
less than the threshold value. We can calculate the max lag by iterating
over ReplicationSlotCtl->replication_slots. If this is not something we
don't want to do in the core, at least adding a hook for XlogInsert is of
great value.

A few other scenarios I can think of with the hook are:

1. Enforcing RPO as described above
2. Enforcing rate limit and slow throttling when sync standby is falling
behind (could be flush lag or replay lag)
3. Transactional log rate governance - useful for cloud providers to
provide SKU sizes based on allowed WAL writes.

Thoughts?

Thanks,
Satya

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Larry Rosenman 2021-12-23 01:16:21 Re: Buildfarm support for older versions
Previous Message Peter Geoghegan 2021-12-22 22:19:16 Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output