Re: Timeline following for logical slots

From: Andres Freund <andres(at)anarazel(dot)de>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timeline following for logical slots
Date: 2016-04-04 10:01:16
Message-ID: 20160404100116.GB25969@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-04-04 17:50:02 +0800, Craig Ringer wrote:
> To rephrase per my understanding: The client only specifies the point it
> wants to start seeing decoded commits. Decoding starts from the slot's
> restart_lsn, and that's the point from which the accumulation of reorder
> buffer contents begins, the snapshot building process begins, and where
> accumulation of relcache invalidation information begins. At restart_lsn no
> xact that is to be emitted to the client may yet be in progress. Decoding,
s/yet/already/
> whether or not the xacts will be fed to the output plugin callbacks,
> requires access to the system catalogs. Therefore catalog_xmin reported by
> the slot must be >= the real effective catalog_xmin of the heap and valid
> at the restart_lsn, not just the confirmed flush point or the point the
> client specifies to resume fetching changes from.

Hm. Maybe I'm misunderstanding you here, but doesn't it have to be <=?

> On the original copy of the slot on the pre-failover master the restart_lsn
> would've been further ahead, as would the catalog_xmin. So catalog rows
> have been purged.
+may

> So it's necessary to ensure that the slot's restart_lsn and catalog_xmin
> are advanced in a timely, consistent manner on the replica's copy of the
> slot at a point where no vacuum changes to the catalog that could remove
> needed tuples have been replayed.

Right.

> The only way I can think of to do that really reliably right now, without
> full failover slots, is to use the newly committed pluggable WAL mechanism
> and add a hook to SaveSlotToPath() so slot info can be captured, injected
> in WAL, and replayed on the replica.

I personally think the primary answer is to use separate slots on
different machines. Failover slots can be an extension to that at some
point, but I think they're a secondary goal.

> It'd also be necessary to move
> CheckPointReplicationSlots() out of CheckPointGuts() to the start of a
> checkpoint/restartpoint when WAL writing is still permitted, like the
> failover slots patch does.

Ugh. That makes me rather wary.

> Basically, failover slots as a plugin using a hook, without the
> additions to base backup commands and the backup label.

I'm going to be *VERY* hard to convince that adding a hook inside
checkpointing code is acceptable.

> I'd really hate 9.6 to go out with - still - no way to use logical decoding
> in a basic, bog-standard HA/failover environment. It overwhelmingly limits
> their utility and it's becoming a major drag on practical use of the
> feature. That's a difficulty given that the failover slots patch isn't
> especially trivial and you've shown that lazy sync of slot state is not
> sufficient.

I think the right way to do this is to focus on failover for logical
rep, with separate slots. The whole idea of integrating this physical
rep imo makes this a *lot* more complex than necessary. Not all that
many people are going to want to physical rep and logical rep.

> The restart_lsn from the newer copy of the slot is, as you said, a point we
> know we can reconstruct visibility info.

We can on the master. There's absolutely no guarantee that the
associated serialized snapshot is present on the standby.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-04-04 10:03:33 Re: PATCH: use foreign keys to improve join estimates v1
Previous Message Craig Ringer 2016-04-04 09:50:02 Re: Timeline following for logical slots