Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: masahiko(dot)sawada(at)2ndquadrant(dot)com, sitnikov(dot)vladimir(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Date: 2020-06-02 06:05:27
Message-ID: 20200602.150527.1892561724587572107.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 2 Jun 2020 13:24:56 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Fri, May 29, 2020 at 06:09:06PM +0900, Masahiko Sawada wrote:
> > Yes. Conversely, if we start logical replication in a physical
> > replication connection (i.g. replication=true) we got an error before
> > staring replication:
> >
> > ERROR: logical decoding requires a database connection
> >
> > I think we can prevent that SEGV in a similar way.
>
> Still unconvinced as this restriction stands for logical decoding
> requiring a database connection but it is not necessarily true now as
> physical replication has less restrictions than a logical one.

If we deliberately allow physical replication on a
database-replication connection, we should revise the documentation
that way. On the other hand physical replication has wider access to a
database cluster than logical replication. Thus allowing to start
physical replication on a logical replication connection could
introduce a problem related to privileges. So I think it might be
better that physical and logical replication have separate pg_hba
lines.

Once we explicitly allow physical replication on a logical replication
connection in documentation, it would be far harder to change the
behavior than now.

If we are sure that that cannot be a problem, I don't object the
change in documented behavior.

> Looking at the code, I think that there is some confusion with the
> fake WAL reader used as base reference in InitWalSender() where we
> assume that it could only be used in the context of a non-database WAL
> sender. However, this initialization happens when the WAL sender
> connection is initialized, and what I think this misses is that we
> should try to initialize a WAL reader when actually going through a
> START_REPLICATION command.

At first fake_xlogreader was really a fake one that only provides
callback routines, but it should have been changed to a real
xlogreader at the time it began to store segment information. In that
sense moving to real xlogreader makes sense to me separately from
whether we allow physicalrep on logicalrep connections.

> I can note as well that StartLogicalReplication() moves in this sense
> by setting xlogreader to be the one from logical_decoding_ctx once the
> decoding context has been created.
>
> This results in the attached. The extra test from upthread to check
> that logical decoding is not allowed in a non-database WAL sender is a
> good idea, so I have kept it.

+ ereport(ERROR,
+ (errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));

The same error message is accompanied by a DETAILS in some other
places. Don't we need one for this?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-06-02 06:15:15 Re: Small doc improvement about spilled txn tracking
Previous Message Masahiko Sawada 2020-06-02 05:59:55 Re: Small doc improvement about spilled txn tracking