Re: Timeline following for logical slots

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Timeline following for logical slots
Date: 2016-03-15 04:42:34
Message-ID: CAMsr+YG2eLDTp0DPmgCwnGJvXJGh69PRW7TEBh01z8tjt7frZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15 March 2016 at 07:10, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:

> Petr Jelinek wrote:
> > On 04/03/16 17:08, Craig Ringer wrote:
> > >I'd really appreciate some review of the logic there by people who know
> > >timelines well and preferably know the xlogreader. It's really just one
> > >function and 2/3 comments; the code is simple but the reasoning leading
> > >to it is not.
> >
> > I think this will have to be work for committer at this point. I can't
> find
> > any flaws in the logic myself so I unless somebody protests I am going to
> > mark this as ready for committer.
>
> Great, thanks. I've studied this to the point where I'm confident that
> it makes sense, so I'm about to push it.
>

Thanks, though I'm happy enough to wait for Andres's input as he requested.

> Also, hopefully you don't have any future callers for the functions I
> marked static (I checked the failover slots series and couldn't see
> anything). I will push this early tomorrow.
>

I don't see it being overly useful for an extension, so that sounds fine.

> One thing I'm not quite sure about is why this is said to apply to
> "logical slots" and not just to replication slots in general. In what
> sense does it not apply to physical replication slots?
>

It's only useful for logical slots currently because physical slot
timeline following is done client side by the walreceiver. When the
walsender hits the end of the timeline it sends CopyDone and returns to
command mode. The client is expected to request the timeline history file,
parse it, work out which timeline to request next and specify that in its
next START_REPLICATION call.

None of that happens for logical slots. Andres was quite deliberate about
keeping timelines out of the interface for logical slots and logical
replication. I tried to retain that when adding the ability to follow
physical failover, keeping things simple for the client. Given that logical
replication slots are intended to be consumed by more than just a postgres
downstream it IMO makes sense to keep as much internal complexity hidden,
especially when dealing with something as surprisingly convoluted as
timelines.

This does mean that we can't tell if we replayed past the timeline switch
point on the old master before we switched to the new master, but I don't
think sending a timeline history file is the solution there. I'd rather
expose a walsender command and SQL function to let the client say, after
connecting, "I last replayed from timeline <x> up to point <y>, if I start
replaying from you will I get a consistent stream?". That can be done
separately to this patch and IMO isn't crucial since clients can currently
work it out, just with more difficulty.

Maybe physical rep can use the same logic, but I'm really not convinced. It
already knows how to follow timelines client-side and handles that as part
of walreceiver and archive recovery. Because recovery handles following the
timeline switches and walreceiver just fetches the WAL as an alternative to
archive fetches I'm not sure it makes sense; we still have to have all that
logic for archive recovery from restore_command, so doing it differently
for streaming just makes things more complicated for no clear benefit.

I certainly didn't want to address it in the same patch, much like I
intentionally avoided trying to teach pg_xlogdump to be able to follow
timelines. Partly to keep it simpler and focused, partly because it'd
require timeline.c to be made frontend-clean which means no List, no elog,
etc...

>
> (I noticed that we have this new line,
> - randAccess = true;
> + randAccess = true; /* allow readPageTLI to go
> backwards */
> in which now the comment is an identical copy of an identical line
> elsewhere; however, randAccess doesn't actually affect readPageTLI in
> any way, so AFAICS both comments are now wrong.)
>

Yeah, that's completely bogus. I have a clear image in my head of
randAccess being tested by ValidXLogPageHeader where it sanity
checks state->latestPageTLI, the TLI of the *prior* page, against the TLI
of the most recent page read, to make sure the TLI advances. But nope, I'm
apparently imagining things 'cos it just isn't there, it just tests to see
if we read a LSN later than the prior page instead.

> > Well for testing purposes it's quite fine I think. The TAP framework
> > enhancements needed for this are now in and it works correctly against
> > current master.
>
> Certainly the new src/test/recovery/t/006whatever.pl file is going to be
> part of the commit. What I'm not so sure about is
> src/test/modules/decoding_failover: is there any reason we don't just
> put the new functions in contrib/test_decoding?

Because contrib/test_decoding isn't an extension. It is a sample logical
decoding output plugin. I didn't want to mess it up by adding an extension
control file, extension script and some extension functions. Sure, you
*can* use the same shared library as an extension and as a logical decoding
output plugin but I think that makes it rather more confusing as an
example.

Also, because I don't want people reading test_decoding to find those
functions and think it's a good idea to actually use them. It's hidden away
in src/test/modules for a reason ;)

I have no particular objection to stripping the test module and the parts
of the t/ script associated with it if you don't think it's worthwhile.
I'd be happy to have it as an ongoing test showing that "hot" timeline
following on a logical slot does work, but the tests that use only base
backups are probably sufficient to detect obvious breakage/regressions.

If Michael Paquier's decoder_raw and receiver_raw were in contrib/ I'd add
support for doing it into reciever_raw without exposing SQL functions. They
aren't in core and can't form part of the core test suite, so I needed
something minimalist to prove the concept, and figured it might as well
serve as a regression test as well.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-03-15 04:45:12 Re: pglogical_output - a general purpose logical decoding output plugin
Previous Message Peter Geoghegan 2016-03-15 04:09:39 Re: amcheck (B-Tree integrity checking tool)