Re: Logical decoding on standby

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding on standby
Date: 2016-11-22 02:20:29
Message-ID: CAMsr+YFXbcoCvCsY=CYpgCFJmQLr8X=MYWv04trtgLnVdDa45g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22 November 2016 at 03:14, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2016-11-21 16:17:58 +0800, Craig Ringer wrote:
>> I've prepared a working initial, somewhat raw implementation for
>> logical decoding on physical standbys.
>
> Please attach. Otherwise in a year or two it'll be impossible to look
> this up.

Fair enough. Attached. Easy to apply with "git am".

I'm currently looking at making detection of replay conflict with a
slot work by separating the current catalog_xmin into two effective
parts - the catalog_xmin currently needed by any known slots
(ProcArray->replication_slot_catalog_xmin, as now), and the oldest
actually valid catalog_xmin where we know we haven't removed anything
yet.

That'll be recorded in a new CheckPoint.oldestCatalogXid field and in
ShmemVariableCache ( i.e. VariableCacheData.oldestCatalogXid ).

Vacuum will be responsible for advancing
VariableCacheData.oldestCatalogXid by writing an expanded
xl_heap_cleanup_info record with a new latestRemovedCatalogXid field
and then advancing the value in the ShmemVariableCache. Vacuum will
only remove rows of catalog or user-catalog tables that are older than
VariableCacheData.oldestCatalogXid.

This allows recovery on a standby to tell, based on the last
checkpoint + any xl_heap_cleanup_info records used to maintain
ShmemVariableCache, whether the upstream has removed catalog or
user-catalog records it needs. We can signal walsenders with running
xacts to terminate if their xmin passes the threshold, and when they
start a new xact they can check to see if they're still valid and bail
out.

xl_heap_cleanup_info isn't emitted much, but if adding a field there
is a problem we can instead add an additional xlog buffer that's only
appended when wal_level = logical.

Doing things this way avoids:

* the need for the standby to be able to tell at redo time whether a
RelFileNode is for a catalog or user-catalog relation without access
to relcache; or
* the need to add info on whether a catalog or user-catalog is being
affected to any xlog record that can cause a snapshot conflict on
standby; or
* a completely reliably way to ensure hot_standby_feedback can never
cease to affect the master even if the user does something dumb

at the cost of sometimes somewhat delaying removal of catalog or
user-catalog tuples when wal_level >= hot_standby, a new CheckPoint
field, and a new field in xl_heap_cleanup_info .

The above is not incorporated in the attached patch series, see the
prior post for status of the attached patches.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Add-an-optional-endpos-LSN-argument-to-pg_recvlogica.patch text/x-patch 10.6 KB
0002-Add-a-pg_recvlogical-wrapper-to-PostgresNode.patch text/x-patch 5.5 KB
0003-Follow-timeline-switches-in-logical-decoding.patch text/x-patch 20.9 KB
0004-PostgresNode-methods-to-wait-for-node-catchup.patch text/x-patch 6.7 KB
0005-Create-new-pg_lsn-class-to-deal-with-awkward-LSNs-in.patch text/x-patch 7.2 KB
0006-Expand-streaming-replication-tests-to-cover-hot-stan.patch text/x-patch 6.1 KB
0007-Send-catalog_xmin-in-hot-standby-feedback-protocol.patch text/x-patch 5.6 KB
0008-Make-walsender-respect-catalog_xmin-in-hot-standby-f.patch text/x-patch 7.4 KB
0009-Allow-GetOldestXmin-.-to-optionally-disregard-the-ca.patch text/x-patch 9.6 KB
0010-Send-catalog_xmin-separately-in-hot_standby_feedback.patch text/x-patch 2.0 KB
0011-Update-comment-on-issues-with-logical-decoding-on-st.patch text/x-patch 2.1 KB
0012-Don-t-attempt-to-export-a-snapshot-from-CREATE_REPLI.patch text/x-patch 1.4 KB
0013-ERROR-if-timeline-is-zero-in-walsender.patch text/x-patch 940 bytes
0014-Permit-logical-decoding-on-standby-with-a-warning.patch text/x-patch 3.1 KB
0015-Tests-for-logical-decoding-on-standby.patch text/x-patch 7.4 KB
0016-Drop-logical-replication-slots-when-redoing-database.patch text/x-patch 13.5 KB
0017-Allow-walsender-to-exit-on-conflict-with-recovery.patch text/x-patch 2.7 KB
0018-Tests-for-db-drop-during-decoding-on-standby.patch text/x-patch 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-11-22 02:39:07 Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
Previous Message Kyotaro HORIGUCHI 2016-11-22 02:05:09 Re: condition variables