Logical decoding on standby

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Logical decoding on standby
Date: 2016-11-21 08:17:58
Message-ID: CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi all

I've prepared a working initial, somewhat raw implementation for
logical decoding on physical standbys. Since it's a series of 20
smallish patches at the moment I won't attach it. You can find the
current version at time of writing here:

https://github.com/postgres/postgres/compare/c5f365f3ab...2ndQuadrant:dev/logical-decoding-on-standby-pg10-v1?expand=1

i.e the tag dev/logical-decoding-on-standby-pg10-v1 in github
repo 2ndQuadrant/postgres.

and whatever I'm working on (subject to rebase, breakage, etc) lives
in the branch dev/logical-decoding-on-standby-pg10 .

Quickstart
===

Compile and install like usual; make sure to install test_decoding
too. To see the functionality in action, configure with
--enable-tap-tests and:

make -C src/test/recovery check

To try manually, initdb a master, set pg_hba.conf to 'trust' on all
replication connections, append to postgresql.conf:

wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
hot_standby_feedback = on

then start the master. Now

psql -d 'master_connstr' -c "SELECT
pg_create_physical_replication_slot('standby1');"

and

pg_basebackup -d 'master_connstr' -X stream -R --slot=standby1

and start the replica.

You can now use pg_recvlogical to create a slot on the replica and
decode changes from it, e.g.

pg_recvlogical -d 'replica_connstr' -S test -P test_decoding --create-slot
pg_recvlogical -d 'replica_connstr' -S 'test' -f - --start

and you'll (hopefully) see subsequent changes you make on the master.
If not, tell me.

Patch series contents
===

This patch series incorporates the following changes:

* Timeline following for logical slots, so they can start decoding on
the correct timeline and follow timeline switches (with tests);
originally [3]

* Add --endpos to pg_recvlogical, with tests; originally [3]

* Splitting of xmin and catalog_xmin on hot standby feedback, so
logical slots on a replica only hold down catalog_xmin on the master,
not the xmin for user tables (with tests). Minimises upstream bloat;
originally [4]

* Suppress export of snapshot when starting logical decoding on
replica. Since we cannot allocate an xid, we cannot export snapshots
on standby. Decoding clients can still do their initial setup via a
slot on the master then switch over, do it via physical copy, etc.

* Require hot_standby_feedback to be enabled when starting logical
decoding on a standby.

* Drop any replication slots from a database when redo'ing database
drop, so we don't leave dangling slots on the replica (with tests).

* Make the walsender respect SIGUSR1 and exit via
RecoveryConflictInterrupt() when it gets
PROCSIG_RECOVERY_CONFLICT_DATABASE (with tests); see [6]

* PostgresNode.pm enhancements for the tests

* New test coverage for logical decoding on standby

Remaining issues
===

* The method used to make the walsender respect conflict with recovery
interrupts may not be entirely safe, see walsender
procsignal_sigusr1_handler thread [5];

* We probably terminate walsenders running inside an output plugin
with a virtual xact whose xmin is below the upstream's global xmin,
even though its catalog xmin is fine, in
ResolveRecoveryConflictWithSnapshot(...). Haven't been able to test
this. Need to only terminate them when the conflict affects relations
accessible in logical decoding, which likely needs the upstream to
send more info in WAL;

* logical decoding timeline following needs tests for cascading
physical replication where an intermediate node is promoted per
timeline following thread [3];

* walsender may need to maintain ThisTimeLineID in more places per
decoding timeline following v10 thread [3];

* it may be desirable to refactor the walsender to deliver cleaner
logical decoding timeline following per the decoding timeline
following v10 thread[3]

also:

* Nothing stops the user from disabling hot_standby_feedback on the
standby or dropping and re-creating the physical slot on the master,
causing needed catalog tuples to get vacuumed away. Since it's not
going to be safe to check slot shmem state from the
hot_standby_feedback verify hook and we let hot_standby_feedback
change at runtime this is going to be hard to fix comprehensively, so
we need to cope with what happens when feedback fails, but:

* We don't yet detect when upstream's catalog_xmin increases past our
needed catalog_xmin and needed catalog tuples are vacuumed away by the
upstream. So we don't invalidate the slot or terminate any active
decoding sessions using the slot. Active decoding sessions often won't
have a vtxid to use with ResolveRecoveryConflictWithVirtualXIDs(),
transaction cancel is not going to be sufficient, and anyway it'll
cancel too aggressively since it doesn't know it's safe to apply
changes that affect only (non-user-catalog) heap tables without
conflict with decoding sessions.

... so this is definitely NOT ready for commit. It does, however, make
logical decoding work on standby.

Next steps
===

Since it doesn't look practical to ensure there's never been a gap in
hot standby feedback or detect such a gap directly, I'm currently
looking at ways to reliably detect when the upstream has removed
tuples we need and error out. That means we need a way to tell when
upstream's catalog_xmin has advanced, which we don't currently have
from xlogs. Checkpoint's oldestXID is insufficient since advance
could've happened since last checkpoint.

Related threads
===

This series supercedes:

* Timeline following for logical slots
[1] https://www.postgresql.org/message-id/flat/CAMsr+YH-C1-X_+s=2nzAPnR0wwqJa-rUmVHSYyZaNSn93MUBMQ(at)mail(dot)gmail(dot)com

* WIP: Failover Slots
[2] https://www.postgresql.org/message-id/CAMsr+YFqtf6ecDVmJSLpC_G8T6KoNpKZZ_XgksODwPN+f=evqg@mail.gmail.com

and incorporates the patches in:

* Logical decoding timeline following take II
[3] https://www.postgresql.org/message-id/flat/CAMsr+YEQB3DbxmCUTTTX4RZy8J2uGrmb5+_ar_joFZNXa81Fug(at)mail(dot)gmail(dot)com

* Send catalog_xmin separately in hot standby feedback
[4] https://www.postgresql.org/message-id/CAMsr+YFi-LV7S8ehnwUiZnb=1h_14PwQ25d-vyUNq-f5S5r=Zg@mail.gmail.com

* Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from
walsender?
[5] https://www.postgresql.org/message-id/CAMsr+YFb3R-t5O0jPGvz9_nsAt2GwwZiLSnYu3=X6mR9RnrbEw@mail.gmail.com

Also relevant:

* Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() in walsender
[6] https://www.postgresql.org/message-id/CAMsr+YFb3R-t5O0jPGvz9_nsAt2GwwZiLSnYu3=X6mR9RnrbEw@mail.gmail.com

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-11-21 08:21:47 Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
Previous Message Magnus Hagander 2016-11-21 07:50:14 Re: jsonb_delete with arrays