Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>, dilipbalaut(at)gmail(dot)com, smithpb2250(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-03-18 09:54:11
Message-ID: CAPpHfduHiH=ZfsCtKsZx4H97q5z60qRJWvTgKzN8z+m4vdGk8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 18, 2024 at 5:17 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sun, Mar 17, 2024 at 7:40 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com>
wrote:
> > On Sat, Mar 16, 2024 at 5:05 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > > On Sat, Mar 16, 2024 at 4:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > > > In general, it seems this patch has been stuck for a long time on
the
> > > > decision to choose an appropriate UI (syntax), and we thought of
> > > > moving it further so that the other parts of the patch can be
> > > > reviewed/discussed. So, I feel before pushing this we should see
> > > > comments from a few (at least two) other senior members who earlier
> > > > shared their opinion on the syntax. I know we don't have much time
> > > > left but OTOH pushing such a change (where we didn't have a
consensus
> > > > on syntax) without much discussion at this point of time could lead
to
> > > > discussions after commit.
> > >
> > > +1 to gain consensus first on the syntax changes. With this, we might
> > > be violating the SQL standard for explicit txn commands (I stand for
> > > correction about the SQL standard though).
> >
> > Thank you for your feedback. Generally, I agree it's correct to get
> > consensus on syntax first. And yes, this patch has been here since
> > 2016. We didn't get consensus for syntax for 8 years. Frankly
> > speaking, I don't see a reason why this shouldn't take another 8
> > years. At the same time the ability to wait on standby given LSN is
> > replayed seems like pretty basic and simple functionality. Thus, it's
> > quite frustrating it already took that long and still unclear when/how
> > this could be finished.
> >
> > My current attempt was to commit minimal implementation as less
> > invasive as possible. A new clause for BEGIN doesn't require
> > additional keywords and doesn't introduce additional statements. But
> > yes, this is still a new qual. And, yes, Amit you're right that even
> > if I had committed that, there was still a high risk of further
> > debates and revert.
> >
> > Given my specsis about agreement over syntax, I'd like to check
> > another time if we could go without new syntax at all. There was an
> > attempt to implement waiting for lsn as a function. But function
> > holds a snapshot, which could prevent WAL records from being replayed.
> > Releasing a snapshot could break the parent query. But now we have
> > procedures, which need a dedicated statement for the call and can even
> > control transactions. Could we implement a waitlsn in a procedure
> > that:
> >
> > 1. First, check that it was called with non-atomic context (that is,
> > it's not called within a transaction). Trigger error if called with
> > atomic context.
> > 2. Release a snapshot to be able to wait without risk of WAL replay
> > stuck. Procedure is still called within the snapshot. It's a bit of
> > a hack to release a snapshot, but Vacuum statements already do so.
> >
>
> Can you please provide a bit more details with some example what is
> the existing problem with functions and how using procedures will
> resolve it? How will this this address the implicit transaction case
> or do we have any other workaround for those cases?

Please check [1] and [2] for the explanation of the problem with functions.

Also, please find a draft patch implementing the procedure. The issue with
the snapshot is addressed with the following lines.

We first ensure we're in a non-atomic context, then pop an active snapshot
(tricky, but ExecuteVacuum() does the same). Then we should have no active
snapshot and it's safe to wait for lsn replay.

if (context->atomic)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("pg_wait_lsn() must be only called in non-atomic
context")));

if (ActiveSnapshotSet())
PopActiveSnapshot();
Assert(!ActiveSnapshotSet());

The function call could be added either before the BEGIN statement or
before the implicit transaction.

CALL pg_wait_lsn('my_lsn', my_timeout); BEGIN;
CALL pg_wait_lsn('my_lsn', my_timeout); SELECT ...;

Links
1.
https://www.postgresql.org/message-id/CAPpHfduBSN8j5j5Ynn5x%3DThD%3D8ypNd53D608VXGweBsPzxPvqA%40mail.gmail.com
2.
https://www.postgresql.org/message-id/CAPpHfdtiGgn0iS1KbW2HTam-1%2BoK%2BvhXZDAcnX9hKaA7Oe%3DF-A%40mail.gmail.com

------
Regards,
Alexander Korotkov

Attachment Content-Type Size
v11-0001-Implement-AFTER-clause-for-BEGIN-command.patch application/octet-stream 20.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-03-18 10:01:20 Re: Improve readability by using designated initializers when possible
Previous Message Heikki Linnakangas 2024-03-18 09:41:41 Re: Refactoring backend fork+exec code