Re: Minimal logical decoding on standbys

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-04-02 16:04:14
Message-ID: 20190402160414.h4wwhetcpggmy3tv@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-04-02 15:26:52 +0530, Amit Khandekar wrote:
> On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> > I managed to get a recovery conflict by :
> > 1. Setting hot_standby_feedback to off
> > 2. Creating a logical replication slot on standby
> > 3. Creating a table on master, and insert some data.
> > 2. Running : VACUUM FULL;
> >
> > This gives WARNING messages in the standby log file.
> > 2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
> > catalog xmin 474 conflicts with removed xid 477
> > 2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
> > for Heap2/CLEAN: remxid 477
> >
> > But I did not add such a testcase into the test file, because with the
> > current patch, it does not do anything with the slot; it just keeps on
> > emitting WARNING in the log file; so we can't test this scenario as of
> > now using the tap test.
>
> I am going ahead with drop-the-slot way of handling the recovery
> conflict. I am trying out using ReplicationSlotDropPtr() to drop the
> slot. It seems the required locks are already in place inside the for
> loop of ResolveRecoveryConflictWithSlots(), so we can directly call
> ReplicationSlotDropPtr() when the slot xmin conflict is found.

Cool.

> As explained above, the only way I could reproduce the conflict is by
> turning hot_standby_feedback off on slave, creating and inserting into
> a table on master and then running VACUUM FULL. But after doing this,
> I am not able to verify whether the slot is dropped, because on slave,
> any simple psql command thereon, waits on a lock acquired on sys
> catache, e.g. pg_authid. Working on it.

I think that indicates a bug somewhere. If replay progressed, it should
have killed the slot, and continued replaying past the VACUUM
FULL. Those symptoms suggest replay is stuck somewhere. I suggest a)
compiling with WAL_DEBUG enabled, and turning on wal_debug=1, b) looking
at a backtrace of the startup process.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-04-02 16:41:24 Re: [PATCH v22] GSSAPI encryption support
Previous Message Tom Lane 2019-04-02 15:41:16 Re: Column lookup in a row performance