Re: Minimal logical decoding on standbys

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-04-07 07:50:09
Message-ID: 20230407075009.igg7be27ha2htkbt@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-04-07 08:09:50 +0200, Drouvot, Bertrand wrote:
> Hi,
>
> On 4/7/23 7:56 AM, Andres Freund wrote:
> > Hi,
> >
> > On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote:
> > > Done in V63 attached and did change the associated comment a bit.
> >
> > Can you send your changes incrementally, relative to V62? I'm polishing them
> > right now, and that'd make it a lot easier to apply your changes ontop.
> >
>
> Sure, please find them enclosed.

Thanks.

Here's my current working state - I'll go to bed soon.

Changes:

- shared catalog relations weren't handled correctly, because the dboid is
InvalidOid for them. I wrote a test for that as well.

- ReplicationSlotsComputeRequiredXmin() took invalidated logical slots into
account (ReplicationSlotsComputeLogicalRestartLSN() too, but it never looks
at logical slots)

- I don't think the subset of slot xids that were checked when invalidating
was right. We need to check effective_xmin and effective_catalog_xmin - the
latter was using catalog_xmin.

- similarly, it wasn't right that specifically those two fields were
overwritten when invalidated - as that was done, I suspect the changes might
get lost on a restart...

- As mentioned previously, I did not like all the functions in slot.h, nor
their naming. Not yet quite finished with that, but a good bit further

- There were a lot of unrelated changes, e.g. removing comments like
* NB - this runs as part of checkpoint, so avoid raising errors if possible.

- I still don't like the order of the patches, fixing the walsender patches
after introducing support for logical decoding on standby. Reordered.

- I don't think logical slots being invalidated as checked e.g. in
pg_logical_replication_slot_advance()

- I didn't like much that InvalidatePossiblyObsoleteSlot() switched between
kill() and SendProcSignal() based on the "conflict". There very well could
be reasons to use InvalidatePossiblyObsoleteSlot() with an xid from outside
of the startup process in the future. Instead I made it differentiate based
on MyBackendType == B_STARTUP.

I also:

Added new patch that replaces invalidated_at with a new enum, 'invalidated',
listing the reason for the invalidation. I added a check for !invalidated to
ReplicationSlotsComputeRequiredLSN() etc.

Added new patch moving checks for invalid logical slots into
CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes
no sense. As far as I can tell the old message in
pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having
"never previously reserved WAL"

Split "Handle logical slot conflicts on standby." into two. I'm not sure that
should stay that way, but it made it easier to hack on
InvalidateObsoleteReplicationSlots.

Todo:
- write a test that invalidated logical slots stay invalidated across a restart
- write a test that invalidated logical slots do not lead to retaining WAL
- Further evolve the API of InvalidateObsoleteReplicationSlots()
- pass in the ReplicationSlotInvalidationCause we're trying to conflict on?
- rename xid to snapshotConflictHorizon, that'd be more in line with the
ResolveRecoveryConflictWithSnapshot and easier to understand, I think

- The test could stand a bit of cleanup and consolidation
- No need to start 4 psql processes to do 4 updates, just do it in one
safe_psql()
- the sequence of drop_logical_slots(), create_logical_slots(),
change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is
repeated quite a few times
- the stats queries checking for specific conflict counts, including
preceding tests, is pretty painful. I suggest to reset the stats at the
end of the test instead (likely also do the drop_logical_slot() there).
- it's hard to correlate postgres log and the tap test, because the slots
are named the same across all tests. Perhaps they could have a per-test
prefix?
- numbering tests is a PITA, I had to renumber the later ones, when adding a
test for shared catalog tables

My attached version does include your v62-63 incremental chnages.

Greetings,

Andres Freund

Attachment Content-Type Size
va65-0001-replication-slots-replace-invalidated_at-LSN-wi.patch text/x-diff 5.1 KB
va65-0002-Prevent-use-of-invalidated-logical-slot-in-Crea.patch text/x-diff 4.0 KB
va65-0003-Add-support-for-more-causes-to-InvalidateObsole.patch text/x-diff 15.0 KB
va65-0004-Handle-logical-slot-conflicts-on-standby.patch text/x-diff 8.0 KB
va65-0005-Arrange-for-a-new-pg_stat_database_conflicts-an.patch text/x-diff 10.2 KB
va65-0006-For-cascading-replication-wake-up-physical-wals.patch text/x-diff 9.5 KB
va65-0007-Allow-logical-decoding-on-standby.patch text/x-diff 12.1 KB
va65-0008-New-TAP-test-for-logical-decoding-on-standby.patch text/x-diff 34.8 KB
va65-0009-Doc-changes-describing-details-about-logical-de.patch text/x-diff 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Denis Laxalde 2023-04-07 08:01:01 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Previous Message Masahiko Sawada 2023-04-07 07:41:27 Re: CREATE SUBSCRIPTION -- add missing tab-completes