From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Refactor pg_rewind code and make it work against a standby |
Date: | 2020-11-20 14:19:03 |
Message-ID: | a5d96670-644c-dce8-3eda-06964c586453@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20/11/2020 02:38, Andres Freund wrote:
> I locally, on a heavily modified branch (AIO support), started to get
> consistent failures in this test. I *suspect*, but am not sure, that
> it's the test's fault, not the fault of modifications.
>
> As far as I can tell, after the pg_rewind call, there's no guarantee
> that node_c has fully caught up to the 'in A, after C was promoted'
> insertion on node a. Thus at the check_query() I sometimes get just 'in
> A, before promotion' back.
>
> After adding a wait that problem seems to be fixed. Here's what I did
>
> diff --git i/src/bin/pg_rewind/t/007_standby_source.pl w/src/bin/pg_rewind/t/007_standby_source.pl
> index f6abcc2d987..48898bef2f5 100644
> --- i/src/bin/pg_rewind/t/007_standby_source.pl
> +++ w/src/bin/pg_rewind/t/007_standby_source.pl
> @@ -88,6 +88,7 @@ $node_c->safe_psql('postgres', "checkpoint");
> # - you need to rewind.
> $node_a->safe_psql('postgres',
> "INSERT INTO tbl1 VALUES ('in A, after C was promoted')");
> +$lsn = $node_a->lsn('insert');
>
> # Also insert a new row in the standby, which won't be present in the
> # old primary.
> @@ -142,6 +143,8 @@ $node_primary = $node_c;
> # Run some checks to verify that C has been successfully rewound,
> # and connected back to follow B.
>
> +$node_b->wait_for_catchup('node_c', 'replay', $lsn);
> +
> check_query(
> 'SELECT * FROM tbl1',
> qq(in A
Yes, I was able to reproduced that by inserting a strategic sleep in the
test and pausing replication by attaching gdb to the walsender process.
Pushed a fix similar to your patch, but I put the wait_for_catchup()
before running pg_rewind. The point of inserting the 'in A, after C was
promoted' row is that it's present in B when pg_rewind runs.
Thanks!
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Fan | 2020-11-20 14:25:23 | Re: Different results between PostgreSQL and Oracle for "for update" statement |
Previous Message | Alvaro Herrera | 2020-11-20 14:07:52 | Re: VACUUM (DISABLE_PAGE_SKIPPING on) |