Re: Logical decoding on standby

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding on standby
Date: 2016-12-20 07:03:54
Message-ID: f982e621-12a1-ee21-5f27-de457d8a13fb@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/12/16 07:05, Craig Ringer wrote:
> On 21 November 2016 at 16:17, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> Hi all
>>
>> I've prepared a working initial, somewhat raw implementation for
>> logical decoding on physical standbys.
>
> Hi all
>
> I've attached a significantly revised patch, which now incorporates
> safeguards to ensure that we prevent decoding if the master has not
> retained needed catalogs and cancel decoding sessions that are holding
> up apply because they need too-old catalogs
>
> The biggest change in this patch, and the main intrusive part, is that
> procArray->replication_slot_catalog_xmin is no longer directly used by
> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
> added, with a corresponding CheckPoint field. Vacuum notices if
> procArray->replication_slot_catalog_xmin has advanced past
> ShmemVariableCache->oldestCatalogXmin and writes a new xact rmgr
> record with the new value before it copies it to oldestCatalogXmin.
> This means that a standby can now reliably tell when catalogs are
> about to be removed or become candidates for removal, so it can pause
> redo until logical decoding sessions on the standby advance far enough
> that their catalog_xmin passes that point. It also means that if our
> hot_standby_feedback somehow fails to lock in the catalogs our slots
> need on a standby, we can cancel sessions with a conflict with
> recovery.
>
> If wal_level is < logical this won't do anything, since
> replication_slot_catalog_xmin and oldestCatalogXmin will both always
> be 0.
>
> Because oldestCatalogXmin advances eagerly as soon as vacuum sees the
> new replication_slot_catalog_xmin, this won't impact catalog bloat.
>
> Ideally this mechanism won't generally actually be needed, since
> hot_standby_feedback stops the master from removing needed catalogs,
> and we make an effort to ensure that the standby has
> hot_standby_feedback enabled and is using a replication slot. We
> cannot prevent the user from dropping and re-creating the physical
> slot on the upstream, though, and it doesn't look simple to stop them
> turning off hot_standby_feedback or turning off use of a physical slot
> after creating logical slots, either. So we try to stop users shooting
> themselves in the foot, but if they do it anyway we notice and cope
> gracefully. Logging catalog_xmin also helps slots created on standbys
> know where to start, and makes sure we can deal gracefully with a race
> between hs_feedback and slot creation on a standby.
>

Hi,

If this mechanism would not be needed most of the time, wouldn't it be
better to not have it and just have a way to ask physical slot about
what's the current reserved catalog_xmin (in most cases the standby
should actually know what it is since it's sending the hs_feedback, but
first slot created on such standby may need to check). WRT preventing
hs_feedback going off, we can refuse to start with hs_feedback off when
there are logical slots detected. We can also refuse to connect to the
master without physical slot if there are logical slots detected. I
don't see problem with either of those.

You may ask what if user drops the slot and recreates or somehow
otherwise messes up catalog_xmin on master, well, considering that under
my proposal we'd first (on connect) check the slot for catalog_xmin we'd
know about it so we'd either mark the local slots broken/drop them or
plainly refuse to connect to the master same way as if it didn't have
required WAL anymore (not sure which behavior is better). Note that user
could mess up catalog_xmin even in your design the same way, so it's not
really a regression.

In general I don't think that it's necessary to WAL log anything for
this. It will not work without slot and therefore via archiving anyway
so writing to WAL does not seem to buy us anything. There are some
interesting side effects of cascading (ie having A->B->C replication and
creating logical slot on C) but those should not be insurmountable. Plus
it might even be okay to only allow creating logical slots on standbys
connected directly to master in v1.

That's about approach, but since there are prerequisite patches in the
patchset that don't really depend on the approach I will comment about
them as well.

0001 and 0002 add testing infrastructure and look fine to me, possibly
committable.

But in 0003 I don't understand following code:
> + if (endpos != InvalidXLogRecPtr && !do_start_slot)
> + {
> + fprintf(stderr,
> + _("%s: cannot use --create-slot or --drop-slot together with --endpos\n"),
> + progname);
> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
> + progname);
> + exit(1);
> + }

Why is --create-slot and --endpos not allowed together?

0004 again looks good but depends on 0003.

0005 is timeline following which is IMHO ready for committer, as is 0006
and 0008 and I still maintain the opinion that these should go in soon.

0007 is unfinished as you said in your mail (missing option to specify
behavior). And the last one 0009 is the implementation discussed above,
which I think needs rework. IMHO 0007 and 0009 should be ultimately merged.

I think parts of this could be committed separately and are ready for
committer IMHO, but there is no way in CF application to mark only part
of patch-set for committer to attract the attention.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2016-12-20 07:10:34 Re: Logical Replication WIP
Previous Message Fujii Masao 2016-12-20 05:56:45 Re: invalid combination of options "-D - -F t -X stream" in pg_basebackup