Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-04-09 16:53:16
Message-ID: CAJ3gD9f+aFrq_fOuUqdtjapHZKP8V8feXZVE2u4PiEp=32BkEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 6 Apr 2019 at 04:45, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> Thanks for the new version of the patch. Btw, could you add Craig as a
> co-author in the commit message of the next version of the patch? Don't
> want to forget him.

I had put his name in the earlier patch. But now I have made it easier to spot.

>
> On 2019-04-05 17:08:39 +0530, Amit Khandekar wrote:
> > Regarding the test result failures, I could see that when we drop a
> > logical replication slot at standby server, then the catalog_xmin of
> > physical replication slot becomes NULL, whereas the test expects it to
> > be equal to xmin; and that's the reason a couple of test scenarios are
> > failing :
> >
> > ok 33 - slot on standby dropped manually
> > Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
> > done
> > not ok 34 - physical catalog_xmin still non-null
> > not ok 35 - xmin and catalog_xmin equal after slot drop
> > # Failed test 'xmin and catalog_xmin equal after slot drop'
> > # at t/016_logical_decoding_on_replica.pl line 272.
> > # got:
> > # expected: 2584
> >
> > I am not sure what is expected. What actually happens is : the
> > physical xlot catalog_xmin remains NULL initially, but becomes
> > non-NULL after the logical replication slot is created on standby.
>
> That seems like the correct behaviour to me - why would we still have a
> catalog xmin if there's no slot logical slot?

Yeah ... In the earlier implementation, maybe it was different, that's
why the catalog_xmin didn't become NULL. Not sure. Anyways, I have
changed this check. Details in the following sections.

>
>
> > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> > index 006446b..5785d2f 100644
> > --- a/src/backend/replication/slot.c
> > +++ b/src/backend/replication/slot.c
> > @@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void)
> > }
> > }
> >
> > +void
> > +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
> > +{
> > + int i;
> > + bool found_conflict = false;
> > +
> > + if (max_replication_slots <= 0)
> > + return;
> > +
> > +restart:
> > + if (found_conflict)
> > + {
> > + CHECK_FOR_INTERRUPTS();
> > + /*
> > + * Wait awhile for them to die so that we avoid flooding an
> > + * unresponsive backend when system is heavily loaded.
> > + */
> > + pg_usleep(100000);
> > + found_conflict = false;
> > + }
> > +
> > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> > + for (i = 0; i < max_replication_slots; i++)
> > + {
> > + ReplicationSlot *s;
> > + NameData slotname;
> > + TransactionId slot_xmin;
> > + TransactionId slot_catalog_xmin;
> > +
> > + s = &ReplicationSlotCtl->replication_slots[i];
> > +
> > + /* cannot change while ReplicationSlotCtlLock is held */
> > + if (!s->in_use)
> > + continue;
> > +
> > + /* not our database, skip */
> > + if (s->data.database != InvalidOid && s->data.database != dboid)
> > + continue;
> > +
> > + SpinLockAcquire(&s->mutex);
> > + slotname = s->data.name;
> > + slot_xmin = s->data.xmin;
> > + slot_catalog_xmin = s->data.catalog_xmin;
> > + SpinLockRelease(&s->mutex);
> > +
> > + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
> > + {
> > + found_conflict = true;
> > +
> > + ereport(WARNING,
> > + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
> > + NameStr(slotname), slot_xmin, xid)));
> > + }
> > +
> > + if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
> > + {
> > + found_conflict = true;
> > +
> > + ereport(WARNING,
> > + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
> > + NameStr(slotname), slot_catalog_xmin, xid)));
> > + }
> > +
> > +
> > + if (found_conflict)
> > + {
> > + elog(WARNING, "Dropping conflicting slot %s", s->data.name.data);
> > + LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
> > + ReplicationSlotDropPtr(s);
> > +
> > + /* We released the lock above; so re-scan the slots. */
> > + goto restart;
> > + }
> > + }
> >
> I think this should be refactored so that the two found_conflict cases
> set a 'reason' variable (perhaps an enum?) to the particular reason, and
> then only one warning should be emitted. I also think that LOG might be
> more appropriate than WARNING - as confusing as that is, LOG is more
> severe than WARNING (see docs about log_min_messages).

What I have in mind is :

ereport(LOG,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("Dropping conflicting slot %s", s->data.name.data),
errdetail("%s, removed xid %d.", conflict_str, xid)));
where conflict_str is a dynamically generated string containing
something like : "slot xmin : 1234, slot catalog_xmin: 5678"
So for the user, the errdetail will look like :
"slot xmin: 1234, catalog_xmin: 5678, removed xid : 9012"
I think the user can figure out whether it was xmin or catalog_xmin or
both that conflicted with removed xid.
If we don't do this way, we may not be able to show in a single
message if both xmin and catalog_xmin are conflicting at the same
time.

Does this message look good to you, or you had in mind something quite
different ?

>
>
> > @@ -0,0 +1,386 @@
> > +# Demonstrate that logical can follow timeline switches.
> > +#
> > +# Test logical decoding on a standby.
> > +#
> > +use strict;
> > +use warnings;
> > +use 5.8.0;
> > +
> > +use PostgresNode;
> > +use TestLib;
> > +use Test::More tests => 55;
> > +use RecursiveCopy;
> > +use File::Copy;
> > +
> > +my ($stdin, $stdout, $stderr, $ret, $handle, $return);
> > +my $backup_name;
> > +
> > +# Initialize master node
> > +my $node_master = get_new_node('master');
> > +$node_master->init(allows_streaming => 1, has_archiving => 1);
> > +$node_master->append_conf('postgresql.conf', q{
> > +wal_level = 'logical'
> > +max_replication_slots = 4
> > +max_wal_senders = 4
> > +log_min_messages = 'debug2'
> > +log_error_verbosity = verbose
> > +# send status rapidly so we promptly advance xmin on master
> > +wal_receiver_status_interval = 1
> > +# very promptly terminate conflicting backends
> > +max_standby_streaming_delay = '2s'
> > +});
> > +$node_master->dump_info;
> > +$node_master->start;
> > +
> > +$node_master->psql('postgres', q[CREATE DATABASE testdb]);
> > +
> > +$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
> > +$backup_name = 'b1';
> > +my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
> > +TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
> > +
> > +sub print_phys_xmin
> > +{
> > + my $slot = $node_master->slot('decoding_standby');
> > + return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
> > +}
> > +
> > +my ($xmin, $catalog_xmin) = print_phys_xmin();
> > +# After slot creation, xmins must be null
> > +is($xmin, '', "xmin null");
> > +is($catalog_xmin, '', "catalog_xmin null");
> > +
> > +my $node_replica = get_new_node('replica');
> > +$node_replica->init_from_backup(
> > + $node_master, $backup_name,
> > + has_streaming => 1,
> > + has_restoring => 1);
> > +$node_replica->append_conf('postgresql.conf',
> > + q[primary_slot_name = 'decoding_standby']);
> > +
> > +$node_replica->start;
> > +$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
> > +
> > +# with hot_standby_feedback off, xmin and catalog_xmin must still be null
> > +($xmin, $catalog_xmin) = print_phys_xmin();
> > +is($xmin, '', "xmin null after replica join");
> > +is($catalog_xmin, '', "catalog_xmin null after replica join");
> > +
> > +$node_replica->append_conf('postgresql.conf',q[
> > +hot_standby_feedback = on
> > +]);
> > +$node_replica->restart;
> > +sleep(2); # ensure walreceiver feedback sent
>
> Can we make this more robust? E.g. by waiting till pg_stat_replication
> shows the change on the primary? Because I can guarantee that this'll
> fail on slow buildfarm machines (say the valgrind animals).
>
>
>
>
> > +$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
> > +sleep(2); # ensure walreceiver feedback sent
>
> Similar.

Ok. I have put a copy of the get_slot_xmins() function from
t/001_stream_rep.pl() into 016_logical_decoding_on_replica.pl. Renamed
it to wait_for_phys_mins(). And used this to wait for the
hot_standby_feedback change to propagate to master. This function
waits for the physical slot's xmin and catalog_xmin to get the right
values depending on whether there is a logical slot in standby and
whether hot_standby_feedback is on on standby.

I was not sure how pg_stat_replication could be used to identify about
hot_standby_feedback change reaching to master. So i did the above
way, which I think pretty much does what we want, I think.

Attached v4 patch only has the testcase change, and some minor cleanup
in the test file.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
logical-decoding-on-standby_v4.patch application/x-patch 39.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2019-04-09 16:54:03 Re: Zedstore - compressed in-core columnar storage
Previous Message Andrew Dunstan 2019-04-09 16:52:49 Re: jsonpath