Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-04-05 11:38:39
Message-ID: CAJ3gD9cFg-FgG5P=p7yym2RMQ0cz9bKkOQ+Mp9p6vxm2se1=FA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 3 Apr 2019 at 19:57, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> Oops, it was my own change that caused the hang. Sorry for the noise.
> After using wal_debug, found out that after replaying the LOCK records
> for the catalog pg_auth, it was not releasing it because it had
> actually got stuck in ReplicationSlotDropPtr() itself. In
> ResolveRecoveryConflictWithSlots(), a shared
> ReplicationSlotControlLock was already held before iterating through
> the slots, and now ReplicationSlotDropPtr() again tries to take the
> same lock in exclusive mode for setting slot->in_use, leading to a
> deadlock. I fixed that by releasing the shared lock before calling
> ReplicationSlotDropPtr(), and then re-starting the slots' scan over
> again since we released it. We do similar thing for
> ReplicationSlotCleanup().
>
> Attached is a rebased version of your patch
> logical-decoding-on-standby.patch. This v2 version also has the above
> changes. It also includes the tap test file which is still in WIP
> state, mainly because I have yet to add the conflict recovery handling
> scenarios.

Attached v3 patch includes a new scenario to test conflict recovery
handling by verifying that the conflicting slot gets dropped.

WIth this, I am done with the test changes, except the below question
that I had posted earlier which I would like to have inputs :

Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :

ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
# Failed test 'xmin and catalog_xmin equal after slot drop'
# at t/016_logical_decoding_on_replica.pl line 272.
# got:
# expected: 2584

I am not sure what is expected. What actually happens is : the
physical xlot catalog_xmin remains NULL initially, but becomes
non-NULL after the logical replication slot is created on standby.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
logical-decoding-on-standby_v3.patch application/octet-stream 39.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2019-04-05 11:42:33 Re: FETCH FIRST clause PERCENT option
Previous Message Peter Eisentraut 2019-04-05 11:23:35 Re: log bind parameter values on error