Re: Timeline following for logical slots

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Timeline following for logical slots
Date: 2016-03-04 16:08:52
Message-ID: CAMsr+YEPdRkoZA+uN90vSmXT9GYj0UFaYMn9=O11XW4L3cvosg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 March 2016 at 21:00, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> Hi all
>
> Per discussion on the failover slots thread (
> https://commitfest.postgresql.org/9/488/) I'm splitting timeline
> following for logical slots into its own separate patch.
>
>
I've updated the logical decoding timeline following patch to fix a bug
found as a result of test development related to how Pg renames the last
WAL seg on the old timeline to suffix it with .partial on promotion. The
xlogreader must switch to reading from the newest-timeline version of a
given segment eagerly, for the first page of the segment, since that's the
only one guaranteed to actually exist.

I'd really appreciate some review of the logic there by people who know
timelines well and preferably know the xlogreader. It's really just one
function and 2/3 comments; the code is simple but the reasoning leading to
it is not.

I've also attached an updated version of the tests posted a few days ago.
The tests depend on the remaining patches from the TAP enhancements tree so
it's easiest to just get the whole tree from
https://github.com/2ndQuadrant/postgres/tree/dev/logical-decoding-timeline-following
(subject to regular rebases and force pushes, do not use as a base).

The tests now include a test module that exposes some slots guts to SQL to
allow the client to sync slot state from master to replica(s) without
needing failover slots and the use of extra WAL as transport. It's very
much for-testing-only.

The new test module is used by a second round of tests to demonstrate the
practicality of failover of a logical replication client to a physical
replica using a base backup taken by pg_basebackup and without the presence
of failover slots. I won't pretend it's pretty.

This proves that the approach works barring unforseen showstoppers. It also
proves it's pretty ugly - failover slots provide a much, MUCH simpler and
safer way for clients to achieve this with way less custom code needed by
each client to sync slot state.

I've got a bit of cleanup to do in the test suite and a few more tests to
write for cases where the slot on the replica is allowed to fall behind the
slot on the master but this is mostly waiting on the remaining two TAP test
patches before it can be evaluated for possible push.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Allow-logical-slots-to-follow-timeline-switches.patch text/x-patch 20.3 KB
0002-Tests-for-logical-decoding-timeline-following.patch text/x-patch 23.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-04 16:09:18 Re: silent data loss with ext4 / all current versions
Previous Message Robert Haas 2016-03-04 16:08:37 Re: postgres_fdw vs. force_parallel_mode on ppc