Re: Minimal logical decoding on standbys

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-04-07 15:47:57
Message-ID: 20230407154757.ywqnldz4nsycap3g@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-04-07 17:13:13 +0200, Drouvot, Bertrand wrote:
> On 4/7/23 9:50 AM, Andres Freund wrote:
> > I added a check for !invalidated to
> > ReplicationSlotsComputeRequiredLSN() etc.
> >
>
> looked at 65-0001 and it looks good to me.
>
> > Added new patch moving checks for invalid logical slots into
> > CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes
> > no sense. As far as I can tell the old message in
> > pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
> > "never previously reserved WAL"
> >
>
> looked at 65-0002 and it looks good to me.
>
> > Split "Handle logical slot conflicts on standby." into two. I'm not sure that
> > should stay that way, but it made it easier to hack on
> > InvalidateObsoleteReplicationSlots.
> >
>
> looked at 65-0003 and the others.

Thanks for checking!

> > Todo:
> > - write a test that invalidated logical slots stay invalidated across a restart
>
> Done in 65-66-0008 attached.

Cool.

> > - write a test that invalidated logical slots do not lead to retaining WAL
>
> I'm not sure how to do that since pg_switch_wal() and friends can't be executed on
> a standby.

You can do it on the primary and wait for the records to have been applied.

> > - Further evolve the API of InvalidateObsoleteReplicationSlots()
> > - pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
> > - rename xid to snapshotConflictHorizon, that'd be more in line with the
> > ResolveRecoveryConflictWithSnapshot and easier to understand, I think
> >
>
> Done. The new API can be found in v65-66-InvalidateObsoleteReplicationSlots_API.patch
> attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a switch/case
> can now be used.

Integrated. I moved the cause to the first argument, makes more sense to me
that way.

> The "default" case does not emit an error since this code runs as part
> of checkpoint.

I made it an error - it's a programming error, not some data level
inconsistency if that ever happens.

> > - The test could stand a bit of cleanup and consolidation
> > - No need to start 4 psql processes to do 4 updates, just do it in one
> > safe_psql()
>
> Right, done in v65-66-0008-New-TAP-test-for-logical-decoding-on-standby.patch attached.

> > - the sequence of drop_logical_slots(), create_logical_slots(),
> > change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
> > repeated quite a few times
>
> grouped in reactive_slots_change_hfs_and_wait_for_xmins() in 65-66-0008 attached.
>
> > - the stats queries checking for specific conflict counts, including
> > preceding tests, is pretty painful. I suggest to reset the stats at the
> > end of the test instead (likely also do the drop_logical_slot() there).
>
> Good idea, done in 65-66-0008 attached.
>
> > - it's hard to correlate postgres log and the tap test, because the slots
> > are named the same across all tests. Perhaps they could have a per-test
> > prefix?
>
> Good point. Done in 65-66-0008 attached. Thanks to that and the stats reset the
> check for invalidation is now done in a single function "check_for_invalidation" that looks
> for invalidation messages in the logfile and in pg_stat_database_conflicts.
>
> Thanks for the suggestions: the TAP test is now easier to read/understand.

Integrated all of these.

I think pg_log_standby_snapshot() should be added in "Allow logical decoding
on standby", not the commit adding the tests.

Is this patchset sufficient to subscribe to a publication on a physical
standby, assuming the publication is created on the primary? If so, we should
have at least a minimal test. If not, we should note that restriction
explicitly.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-07 15:52:37 Re: Making background psql nicer to use in tap tests
Previous Message Tom Lane 2023-04-07 15:47:54 Re: [PATCH] Introduce array_shuffle() and array_sample()