Re: Outdated replication protocol error?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Subject: Re: Outdated replication protocol error?
Date: 2021-06-18 01:13:57
Message-ID: dcef9eaae8fc6d8f6f260ff3d051c197cdd29a8e.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2021-06-16 at 16:17 -0700, Andres Freund wrote:
> I think we should explicitly compute the current timeline before
> using
> ThisTimelineID. E.g. in StartReplication() call a new version of
> GetFlushRecPtr() that also returns the current timeline id.

I think all we need to do is follow the pattern in IdentifySystem() by
calling:

am_cascading_walsender = RecoveryInProgress();

first. There are three cases:

1. If the server was a primary the last time RecoveryInProgress() was
called, and it's still a primary, then it returns false immediately.
ThisTimeLineID should be set properly already.

2. If the server was a secondary the last time RecoveryInProgress() was
called, and now it's a primary, then it updates ThisTimeLineID in
InitXLOGAccess() and returns false.

3. If the server was a secondary the last time, and is still a
secondary, then it returns true. Then, StartReplication() will call
GetStandbyFlushRecPtr(), which will update ThisTimeLineID.

It was confusing to me for a while because I was trying to sort out
whether am_cascading_walsender and/or ThisTimeLineID could be out of
date, and how that would result in not updating ThisTimeLineID; and
also why there was a difference between IdentifySystem() and
StartReplication().

Simple patch attached. I didn't test it yet, but wanted to post my
analysis.

Regards,
Jeff Davis

Attachment Content-Type Size
fix-start-replication-identify-system.diff text/x-patch 912 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2021-06-18 01:14:45 Re: Avoid call MaintainOldSnapshotTimeMapping, if old_snapshot_threshold is disabled.
Previous Message Justin Pryzby 2021-06-18 01:11:01 Re: fdatasync performance problem with large number of DB files