Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-05-23 17:38:55
Message-ID: CAJ3gD9dKw8OEDjwbTmQ6e3-VGY64keNRxycHF4roJpsqhtssuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 23 May 2019 at 21:29, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2019-05-23 17:39:21 +0530, Amit Khandekar wrote:
> > On Tue, 21 May 2019 at 21:49, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Yeah, I agree we should add such checks to minimize the possibility of
> > reading logical records from a master that has insufficient wal_level.
> > So to summarize :
> > a. CheckLogicalDecodingRequirements() : Add Controlfile wal_level checks
> > b. Call this function call in CreateInitDecodingContext() as well.
> > c. While decoding XLOG_PARAMETER_CHANGE record, emit recovery conflict
> > error if there is an existing logical slot.
> >
> > This made me think more of the race conditions. For instance, in
> > pg_create_logical_replication_slot(), just after
> > CheckLogicalDecodingRequirements and before actually creating the
> > slot, suppose concurrently Controlfile->wal_level is changed from
> > logical to replica. So suppose a new slot does get created. Later the
> > slot is read, so in pg_logical_slot_get_changes_guts(),
> > CheckLogicalDecodingRequirements() is called where it checks
> > ControlFile->wal_level value. But just before it does that,
> > ControlFile->wal_level concurrently changes back to logical, because
> > of replay of another param-change record. So this logical reader will
> > think that the wal_level is sufficient, and will proceed to read the
> > records, but those records are *before* the wal_level change, so these
> > records don't have logical data.
>
> I don't think that's an actual problem, because there's no decoding
> before the slot exists and CreateInitDecodingContext() has determined
> the start LSN. And by that point the slot exists, slo
> XLOG_PARAMETER_CHANGE replay can error out.

So between the start lsn and the lsn for
parameter-change(logical=>replica) record, there can be some records ,
and these don't have logical data. So the slot created will read from
the start lsn, and proceed to read these records, before reading the
parameter-change record.

Can you re-write the below phrase please ? I suspect there is some
letters missing there :
"And by that point the slot exists, slo XLOG_PARAMETER_CHANGE replay
can error out"

Are you saying we want to error out when the postgres replays the
param change record and there is existing logical slot ? I thought you
were suggesting earlier that it's the decoder.c code which should
error out when reading the param-change record.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-05-23 17:43:11 Re: Why could GEQO produce plans with lower costs than the standard_join_search?
Previous Message Euler Taveira 2019-05-23 17:26:13 Fix link for v12