Re: pg_rewind exiting with error code 1 when source and target are on the same timeline

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: pg_rewind exiting with error code 1 when source and target are on the same timeline
Date: 2015-12-14 23:11:15
Message-ID: 937.1450134675@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On 12/3/15 11:10 PM, Michael Paquier wrote:
>> On Fri, Dec 4, 2015 at 12:22 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>> After playing with this a bit, I think your patch is correct. The code
>>> has drifted a bit in the meantime, so attached is an updated patch.

>> Thanks for looking at it.

> I committed this to master. It's also on the 9.5 open item list, but if
> I backport it then the tests don't pass. Still looking. Not sure yet
> if this is because of code changes in pg_rewind master or test
> infrastructure changes in master.

I poked into this and found that the problem is that 9.5 is lacking the
hunks of commit e50cda78 that teach sanityChecks() to allow the control
file state to be DB_SHUTDOWNED_IN_RECOVERY, to wit

@@ -374,10 +380,11 @@ sanityChecks(void)
/*
* Target cluster better not be running. This doesn't guard against
* someone starting the cluster concurrently. Also, this is probably more
- * strict than necessary; it's OK if the master was not shut down cleanly,
- * as long as it isn't running at the moment.
+ * strict than necessary; it's OK if the target node was not shut down
+ * cleanly, as long as it isn't running at the moment.
*/
- if (ControlFile_target.state != DB_SHUTDOWNED)
+ if (ControlFile_target.state != DB_SHUTDOWNED &&
+ ControlFile_target.state != DB_SHUTDOWNED_IN_RECOVERY)
pg_fatal("target server must be shut down cleanly\n");

/*
@@ -385,75 +392,149 @@ sanityChecks(void)
* server is shut down. There isn't any very strong reason for this
* limitation, but better safe than sorry.
*/
- if (datadir_source && ControlFile_source.state != DB_SHUTDOWNED)
+ if (datadir_source &&
+ ControlFile_source.state != DB_SHUTDOWNED &&
+ ControlFile_source.state != DB_SHUTDOWNED_IN_RECOVERY)
pg_fatal("source data directory must be shut down cleanly\n");
}

(Actually, it's only the second of these that is critical to make the
test pass, but I should think we should apply both of them if either.)

If I apply these, without any of the rest of e50cda78, everything seems
fine. I'm going to go ahead and push that in the interests of getting
some buildfarm cycles on it; but if someone could confirm that this
is not an insane thing to do, it'd help.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2015-12-15 00:14:04 Re: pg_rewind exiting with error code 1 when source and target are on the same timeline
Previous Message ryan 2015-12-14 22:47:29 BUG #13818: PostgreSQL crashes after cronjob runs as "postgres"