Replication origins conflate two separate functions

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Replication origins conflate two separate functions
Date: 2018-03-26 09:10:38
Message-ID: CAMsr+YEViEm8sshLeK6CZV1rx7XadiWf9t63aVA2wfyHDzxwcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi folks

During some recent work with a plugin (pglogical) that uses replication
origins heavily, it's become apparent that replication origins conflate two
orthogonal features into one thing. There's "replication origins (session
replay progress tracking)" and "replication origins (per-transaction commit
origin tracking)".

TL;DR: replication origins should be able to be set independently for txn
for node-of-origin purposes, and for a session for position-tracking
purposes.

This comes up in a few places:

- Sometimes you're replaying a txn with both a proximate origin (immediate
upstream) and ultimate origin (the node that originally wrote it). You
should really store the ultimate upstream as the commit's replication
origin for conflict resolution etc purposes. But you have set your session
up with the proximate origin for replay position tracking purposes. If you
set up your session then change replorigin_session_origin to the ultimate
origin, *crash recovery will update the wrong replication origin's position
tracking*.

- It's not currently possible to use replication origins for replay
position tracking without also storing committs origin data (if committs is
on), because setting replorigin_session_origin=InvalidRepOriginId turns off
replorigin handling entirely.

- Using replorigin_advance is no substitute because it cannot be
crash-safe. Either it risks skipping some changes, or replaying some twice.

It hasn't been much of an issue because nobody's been doing a great deal
with parallelism, re-syncing tables, replication with hop distances other
than 1, etc. But pglogical does support resyncing tables, adding new tables
to existing subscriptions with an initial sync, cascading, etc, and we're
starting to run into these issues. They'll no doubt be a problem for Pg
core logical rep down the track, too.

For example, to resync a table pglogical makes a new slot for the copy,
does a COPY from the new slot's snapshot, then a post-COPY replay from the
replication slot, replaying only tuples for the table of interest, until it
has caught up with the main apply position.

The problem is that there is no crash-safe way to record both the initial
origin of the tuples (committs replication origin) *and* the replay
progress during post-COPY catchup of changes to the table. You have to
write a separate "origin for copy purposes" or something, then keep track
of them. So say you're using a temp slot and do everything in one txn. You
don't need origins for position tracking then, but if you set up the main
session origin for your copy (for committs purposes) you have issues with
exclusive locking of origins because the apply session already has it
locked. Or if you negotiate that, you still can't do multiple parallel
copies.

The tuples came from the same origin as the main apply process, so you want
that origin set on the committs (e.g. origin=1) for correct conflict
detection and reporting, etc.

To me this means that the commit record's replorigin info should separately
track the (origin,lsn) for replay progress tracking and the
(origin,committs) for xact info. It should be possible to record one or the
other independently. A session replorigin should not have to be set in
order to record the (origin,committs) info, it should only be needed for
replay position tracking.

AFAICS that comes down to one extra RepOriginId in commit records with
origins. There's no need for an extra flag bit, we'd just expand the
existing XLOG_INCLUDE_ORIGIN and add a member to the xl_origin subrecord.

Then we can support using origins for just position tracking, for just
commit timestamp origin metadata, or for both.

No BC-breaking changes would occur in the SQL UI anywhere. I'd probably add
an extra arg to pg_replication_origin_xact_setup(...) that lets you set a
per-xact origin, and tweak how replorigins tracks state so you can
use pg_replication_origin_xact_setup without an active session-origin.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Sitnikov 2018-03-26 09:13:02 Re: Proposal: http2 wire format
Previous Message Vladimir Sitnikov 2018-03-26 09:06:50 Re: Proposal: http2 wire format