Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, dilipbalaut(at)gmail(dot)com, smithpb2250(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-03-19 22:34:51
Message-ID: a1893fee413a6a2f387f419049b29a99@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bharath Rupireddy, thank you for you review.
But here is some points.

On 2024-03-16 10:02, Bharath Rupireddy wrote:
> 4.1 With invalid LSN succeeds, shouldn't it error out? Or at least,
> add a fast path/quick exit to WaitForLSN()?
> BEGIN AFTER '0/0';

In postgresql '0/0' is Valid pg_lsn, but it is always reached.

> 4.2 With an unreasonably high future LSN, BEGIN command waits
> unboundedly, shouldn't we check if the specified LSN is more than
> pg_last_wal_receive_lsn() error out?
> BEGIN AFTER '0/FFFFFFFF';
> SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> BEGIN AFTER :'future_receive_lsn';

This case will give ERROR cause '0/FFFFFFFF' + 1 is invalid pg_lsn

> 4.3 With an unreasonably high wait time, BEGIN command waits
> unboundedly, shouldn't we restrict the wait time to some max value,
> say a day or so?
> SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> BEGIN AFTER :'future_receive_lsn' WITHIN 100000;

Good idea, I put it 1 day. But this limit we should to discuss.

> 6.
> +/* Set all latches in shared memory to signal that new LSN has been
> replayed */
> +void
> +WaitLSNSetLatches(XLogRecPtr curLSN)
> +{
>
> I see this patch is waking up all the waiters in the recovery path
> after applying every WAL record, which IMO is a hot path. Is the
> impact of this change on recovery measured, perhaps using
> https://github.com/macdice/redo-bench or similar tools?
>
> 7. In continuation to comment #6, why not use Conditional Variables
> instead of proc latches to sleep and wait for all the waiters in
> WaitLSNSetLatches?

Waiters are stored in the array sorted by LSN. This help us to wake
only PIDs with replayed LSN. This saves us from scanning of whole
array. So it`s not so hot path.

Add some fixes

1) make waiting timeont more simple (as pg_terminate_backend())
2) removed the 1 minute wait because INTERRUPTS don’t arrive for a
long time, changed it to 0.5 seconds
3) add more tests
4) added and expanded sections in the documentation
5) add default variant of timeout
pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)
6) now big timeout will be restricted to 1 day (86400000ms)
CALL pg_wait_lsn('0/34FB5A1',10000000000);
WARNING: Timeout for pg_wait_lsn() restricted to 1 day

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com

Attachment Content-Type Size
v13_0001-Procedure-wait-lsn.patch text/x-diff 17.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-03-19 22:35:27 Re: Possibility to disable `ALTER SYSTEM`
Previous Message Andrew Dunstan 2024-03-19 22:24:47 Re: Possibility to disable `ALTER SYSTEM`