Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>, dilipbalaut(at)gmail(dot)com, smithpb2250(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-03-19 11:51:46
Message-ID: CAA4eK1JK_Rxj8YuRmbRFJHhLk9OaCT-pZXzCYa4ke9uSKJbddA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 18, 2024 at 3:24 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Mon, Mar 18, 2024 at 5:17 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > 1. First, check that it was called with non-atomic context (that is,
> > > it's not called within a transaction). Trigger error if called with
> > > atomic context.
> > > 2. Release a snapshot to be able to wait without risk of WAL replay
> > > stuck. Procedure is still called within the snapshot. It's a bit of
> > > a hack to release a snapshot, but Vacuum statements already do so.
> > >
> >
> > Can you please provide a bit more details with some example what is
> > the existing problem with functions and how using procedures will
> > resolve it? How will this this address the implicit transaction case
> > or do we have any other workaround for those cases?
>
> Please check [1] and [2] for the explanation of the problem with functions.
>
> Also, please find a draft patch implementing the procedure. The issue with the snapshot is addressed with the following lines.
>
> We first ensure we're in a non-atomic context, then pop an active snapshot (tricky, but ExecuteVacuum() does the same). Then we should have no active snapshot and it's safe to wait for lsn replay.
>
> if (context->atomic)
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("pg_wait_lsn() must be only called in non-atomic context")));
>
> if (ActiveSnapshotSet())
> PopActiveSnapshot();
> Assert(!ActiveSnapshotSet());
>
> The function call could be added either before the BEGIN statement or before the implicit transaction.
>
> CALL pg_wait_lsn('my_lsn', my_timeout); BEGIN;
> CALL pg_wait_lsn('my_lsn', my_timeout); SELECT ...;
>

I haven't thought in detail about whether there are any other problems
with this idea but sounds like it should solve the problems you shared
with a function call approach. BTW, if the application has to anyway
know the LSN till where replica needs to wait, why can't they simply
monitor the pg_last_wal_replay_lsn() value?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-03-19 11:53:46 Re: Proposal to include --exclude-extension Flag in pg_dump
Previous Message Andrew Dunstan 2024-03-19 11:49:10 Re: Possibility to disable `ALTER SYSTEM`