Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Date: 2021-12-24 12:01:37
Message-ID: CALj2ACVU9QQe1MJoSe9o1Q3wh+=s_12bii-HCL_rhGdd0-W8EA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 24, 2021 at 4:43 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Dec 24, 2021 at 3:27 AM SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com> wrote:
>>
>> XLogInsert in my opinion is the best place to call it and the hook can be something like this "void xlog_insert_hook(NULL)" as all the throttling logic required is the current flush position which can be obtained from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.
>
> IMHO, it is not a good idea to call an external hook function inside a critical section. Generally, we ensure that we do not call any code path within a critical section which can throw an error and if we start calling the external hook then we lose that control. It should be blocked at the operation level itself e.g. ALTER TABLE READ ONLY, or by some other hook at a little higher level.

Yeah, good point. It's not advisable to give the control to the
external module in the critical section. For instance, memory
allocation isn't allowed (see [1]) and the ereport(ERROR,....) would
transform to PANIC inside the critical section (see [2], [3]).
Moreover the critical section is to be short-spanned i.e. executing
the as minimal code as possible. There's no guarantee that an external
module would follow these.

I suggest we do it at the level of transaction start i.e. when a txnid
is getting allocated i.e. in AssignTransactionId(). If we do this,
when the limit for the throttling is exceeded, the current txn (even
if it is a long running txn) continues to do the WAL insertions, the
next txns would get blocked. But this is okay and can be conveyed to
the users via documentation if need be. We do block txnid assignments
for parallel workers in this function, so this is a good choice IMO.

Thoughts?

[1]
/*
* You should not do memory allocations within a critical section, because
* an out-of-memory error will be escalated to a PANIC. To enforce that
* rule, the allocation functions Assert that.
*/
#define AssertNotInCriticalSection(context) \
Assert(CritSectionCount == 0 || (context)->allowInCritSection)

[2]
/*
* If we are inside a critical section, all errors become PANIC
* errors. See miscadmin.h.
*/
if (CritSectionCount > 0)
elevel = PANIC;

[3]
* A related, but conceptually distinct, mechanism is the "critical section"
* mechanism. A critical section not only holds off cancel/die interrupts,
* but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC)
* --- that is, a system-wide reset is forced. Needless to say, only really
* *critical* code should be marked as a critical section! Currently, this
* mechanism is only used for XLOG-related code.

Regards,
Bharath Rupireddy.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-12-24 12:02:10 Re: An obsolete comment of pg_stat_statements
Previous Message Masahiko Sawada 2021-12-24 11:23:29 Re: more descriptive message for process termination due to max_slot_wal_keep_size