Re: Logical decoding on standby

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding on standby
Date: 2016-12-07 06:05:40
Message-ID: CAMsr+YGd6piauiXQpL7imG6QevU_jKmbZWs7bLJP+5-W5mWq8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 21 November 2016 at 16:17, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> Hi all
>
> I've prepared a working initial, somewhat raw implementation for
> logical decoding on physical standbys.

Hi all

I've attached a significantly revised patch, which now incorporates
safeguards to ensure that we prevent decoding if the master has not
retained needed catalogs and cancel decoding sessions that are holding
up apply because they need too-old catalogs

The biggest change in this patch, and the main intrusive part, is that
procArray->replication_slot_catalog_xmin is no longer directly used by
vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
added, with a corresponding CheckPoint field. Vacuum notices if
procArray->replication_slot_catalog_xmin has advanced past
ShmemVariableCache->oldestCatalogXmin and writes a new xact rmgr
record with the new value before it copies it to oldestCatalogXmin.
This means that a standby can now reliably tell when catalogs are
about to be removed or become candidates for removal, so it can pause
redo until logical decoding sessions on the standby advance far enough
that their catalog_xmin passes that point. It also means that if our
hot_standby_feedback somehow fails to lock in the catalogs our slots
need on a standby, we can cancel sessions with a conflict with
recovery.

If wal_level is < logical this won't do anything, since
replication_slot_catalog_xmin and oldestCatalogXmin will both always
be 0.

Because oldestCatalogXmin advances eagerly as soon as vacuum sees the
new replication_slot_catalog_xmin, this won't impact catalog bloat.

Ideally this mechanism won't generally actually be needed, since
hot_standby_feedback stops the master from removing needed catalogs,
and we make an effort to ensure that the standby has
hot_standby_feedback enabled and is using a replication slot. We
cannot prevent the user from dropping and re-creating the physical
slot on the upstream, though, and it doesn't look simple to stop them
turning off hot_standby_feedback or turning off use of a physical slot
after creating logical slots, either. So we try to stop users shooting
themselves in the foot, but if they do it anyway we notice and cope
gracefully. Logging catalog_xmin also helps slots created on standbys
know where to start, and makes sure we can deal gracefully with a race
between hs_feedback and slot creation on a standby.

There can be a significant delay for slot creation on standby since we
have to wait until there's a new xl_running_xacts record logged. I'd
like to extend the hot_standby_feedback protocol a little to address
that and some other issues, but that's a separate step.

I haven't addressed Petr's point yet, that "there should be parameter
saying if snapshot should be exported
or not and if user asks for it on standby it should fail". Otherwise I
think it's looking fairly solid.

Due to the amount of churn I landed up flattening the patchset. It
probably makes sense to split it up, likely into the sequence of
changes listed in the commit message. I'll wait for a general opinion
on the validity of this approach first.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-PostgresNode-methods-to-wait-for-node-catchup.patch text/x-patch 6.7 KB
0002-Create-new-pg_lsn-class-to-deal-with-awkward-LSNs-in.patch text/x-patch 7.2 KB
0003-Add-an-optional-endpos-LSN-argument-to-pg_recvlogica.patch text/x-patch 10.6 KB
0004-Add-a-pg_recvlogical-wrapper-to-PostgresNode.patch text/x-patch 5.5 KB
0005-Follow-timeline-switches-in-logical-decoding.patch text/x-patch 20.9 KB
0006-Expand-streaming-replication-tests-to-cover-hot-stan.patch text/x-patch 6.1 KB
0007-Don-t-attempt-to-export-a-snapshot-from-CREATE_REPLI.patch text/x-patch 1.4 KB
0008-ERROR-if-timeline-is-zero-in-walsender.patch text/x-patch 938 bytes
0009-Logical-decoding-on-standby.patch text/x-patch 100.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2016-12-07 06:06:15 [sqlsmith] Crash in tsquery_rewrite/QTNBinary
Previous Message Kyotaro HORIGUCHI 2016-12-07 05:49:38 Re: Quorum commit for multiple synchronous replication.