Re: A few nuances about specifying the timeline with START_REPLICATION

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: A few nuances about specifying the timeline with START_REPLICATION
Date: 2021-06-18 19:55:17
Message-ID: 484d05905c22c9ba5150bdde2511b161bb12c8aa.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2021-06-18 at 21:48 +0300, Heikki Linnakangas wrote:
> On 18/06/2021 20:27, Jeff Davis wrote:
> We could teach it to look into the timeline history to find the
> correct
> file, though.

That's how recovery_target_timeline behaves, and it would match my
intuition better if START_REPLICATION behaved that way.

> If the client asks for a historic timeline, the replication will
> stop
> when it reaches the end of that timeline. In hindsight, I think it
> would
> make more sense to send a message to the client to say that it's
> switching to a new timeline, and continue streaming from the new
> timeline.

Why is it important for the standby to be told explicitly in the
protocol about timeline switches? If it is important, why only for
historical timelines?

> Hmm, the timeline in the START_REPLICATION command is not specifying
> a
> recovery target timeline, so I don't think "latest" or "current"
> make
> much sense there. Per above, it just tells the server which timeline
> the
> requested starting point belongs to, so it's actually redundant.

That's not very clear from the docs: "if TIMELINE option is specified,
streaming starts on timeline tli...".

Part of the confusion is that there's not a good distinction in
terminology between:
1. a timeline ID, which is a specific segment of a timeline
2. a timeline made up of the given timeline ID and all its
ancestors, terminating at the given ID
3. the timeline made up of the current ID, all ancestor IDs, and all
descendent IDs that the current active primary switches to
4. the set of all timelines that contain a given ID

It seems you are saying that replication only concerns itself with #3,
which does not require a timeline ID at all. That seems basically
correct for now, but since we already document the protocol to take a
timeline, it makes sense to me to just have the primary serve it if
possible.

If we (continue to?) allow timelines for replication, it will start to
treat the primary like an archive. That might not be quite what was
intended, but could be powerful. You could imagine a special archive
that implements the replication protocol, and have replicas directly
off the archive, or maybe doing PITR off the archive.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-06-18 20:24:45 Re: PoC: Using Count-Min Sketch for join cardinality estimation
Previous Message John Naylor 2021-06-18 19:54:40 Re: PoC: Using Count-Min Sketch for join cardinality estimation