Re: Logical decoding without slots: decoding in lockstep with recovery

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding without slots: decoding in lockstep with recovery
Date: 2021-01-13 03:41:58
Message-ID: CAGRY4nwjxmzaC7xzcvBGKBcE5fD=XkuiOpSLUzWud8rGxhweAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 26 Dec 2020 at 06:51, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2020-12-23 14:56:07 +0800, Craig Ringer wrote:
> > I want to share an idea I've looked at a few times where I've run into
> > situations where logical slots were inadvertently dropped, or where it
> > became necessary to decode changes in the past on a slot.
> >
> > As most of you will know you can't just create a logical slot in the
> past.
> > Even if it was permitted, it'd be unsafe due to catalog_xmin retention
> > requirements and missing WAL.
> >
> > But if we can arrange a physical replica to replay the WAL of interest
> and
> > decode each commit as soon as it's replayed by the startup process, we
> know
> > the needed catalog rows must all exist, so it's safe to decode the
> change.
> >
> > So it should be feasible to run logical decoding in standby, even
> without a
> > replication slot, so long as we:
> >
> > * pause startup process after each xl_xact_commit
> > * wake the walsender running logical decoding
> > * decode and process until ReorderBufferCommit for the just-committed
> xact
> > returns
> > * wake the startup process to decode the up to the next commit
>
> I don't think it's safe to just do this for each xl_xact_commit - we can
> remove needed rows at quite a few places, not just around transaction
> commit.

Good point.

I vaguely recall spotting a possible decoding-on-standby issue with eager
removal of rows that are still ahead of the global xmin if the primary
"knows" can't be needed based on info about currently running backends. But
when looking over code related to HOT, visibility, and vacuum now I can't
for the life of me remember exactly what it was or find it. Hopefully I
just misunderstood at the time or was getting confused between decoding on
standby and xact streaming.

> Rows needed to correctly decode rows earlier in the transaction
> might not be available by the time the commit record was logged.
>

When can that happen?

I think you'd basically have to run logical decoding in lockstep with
> WAL replay, i.e. replay one record, then call logical decoding for that
> record, replay the next record, ...
>

That sounds likely to be unusably slow. The only way I can see it having
any hope of moving at a reasonable rate would be to run a decoding session
inside the startup process itself so we don't have to switch back/forth for
each record. But I imagine that'd probably cause a whole other set of
problems.

> Can anyone see any obvious problem with this?
>
> The patch for logical decoding on the standby
> https://postgr.es/m/20181212204154.nsxf3gzqv3gesl32%40alap3.anarazel.de
> should provide some of the infrastructure to do this properly. Should
> really commit it. /me adds an entry to the top of the todo list.
>

That would certainly be helpful for quite a number of things.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message movead li 2021-01-13 03:46:17 Re: Disable WAL logging to speed up data loading
Previous Message Fujii Masao 2021-01-13 03:08:30 Re: A failure of standby to follow timeline switch