From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_rewind test race condition..? |
Date: | 2015-04-29 00:36:13 |
Message-ID: | 5540277D.8020309@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 04/28/2015 11:02 AM, Stephen Frost wrote:
> Heikki,
>
> Not sure if anyone else is seeing this, but I'm getting regression
> test failures when running the pg_rewind tests pretty consistently
> with 'make check'. Specifically with "basic remote", I'm getting:
>
> source and target cluster are on the same timeline
> Failure, exiting
>
> in regress_log/pg_rewind_log_basic_remote.
>
> If I throw a "sleep(5);" into t/001_basic.pl before the call to
> RewindTest::run_pg_rewind($test_mode); then everything works fine.
The problem seems to be that when the standby is promoted, it's a
so-called "fast promotion", where it writes an end-of-recovery record
and starts accepting queries before creating a real checkpoint.
pg_rewind looks at the TLI in the latest checkpoint, as it's in the
control file, but that isn't updated until the checkpoint completes. I
don't see it on my laptop normally, but I can reproduce it if I insert a
"sleep(5)" in StartupXLog, just before it requests the checkpoint:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7173,7 +7173,10 @@ StartupXLOG(void)
* than is appropriate now that we're not in standby mode anymore.
*/
if (fast_promoted)
+ {
+ sleep(5);
RequestCheckpoint(CHECKPOINT_FORCE);
+ }
}
The simplest fix would be to force a checkpoint in the regression test,
before running pg_rewind. It's a bit of a cop out, since you'd still get
the same issue when you tried to do the same thing in the real world. It
should be rare in practice - you'd not normally run pg_rewind
immediately after promoting the standby - but a better error message at
least would be nice..
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2015-04-29 00:36:30 | Re: Feedback on getting rid of VACUUM FULL |
Previous Message | Ian Barwick | 2015-04-29 00:35:26 | Re: pg_basebackup, tablespace mapping and path canonicalization |