From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | hlinnaka(at)iki(dot)fi |
Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Satoshi Nagayasu <snaga(at)uptime(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <mpaquier(at)vmware(dot)com> |
Subject: | Re: pg_rewind in contrib |
Date: | 2015-03-10 06:46:12 |
Message-ID: | CAA4eK1+yOG0H-U3xToa4N=av=d5WU25CQKH6jcREEGygciGDaw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 9, 2015 at 7:32 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> Attached is a new patch version, fixing all the little things you listed.
I believe this is pretty much ready for commit. I'm going to read it
through myself one more time before committing, but I don't have anything
mind now that needs fixing anymore. I just pushed the change to split
dbcommands.h into dbcommands.h and dbcommands_xlog.h, as that seems like a
nice-to-have anyway.
>
Few assorted comments:
1.
+ <step>
+ <para>
+ Copy all those changed blocks from the new cluster to the old
cluster.
+
</para>
+ </step>
Isn't it possible incase of async replication that old cluster has
some blocks which new cluster doesn't have, what will it do
in such a case?
I have tried to test some form of such a case and it seems to be
failing with below error:
pg_rewind.exe -D ..\..\Data\ --source-pgdata=..\..\Database1
The servers diverged at WAL position 0/16DE858 on timeline 1.
Rewinding from last common checkpoint at 0/16B8A70 on timeline 1
could not open file "..\..\Data\/base/12706/16391" for truncation: No such
file
or directory
Failure, exiting
Exact scenario is:
Node -1 (master):
Step-1
Create table t1(c1 int, c2 char(500)) with (fillfactor=10);
insert into t1 values(generate_series(1,110),'aaaa');
Stop Node-1 (pg_ctl stop ..)
Step-2
Copy manually the data-directory (it contains WAL log as well)
to new location say Database1
Node-2 (standby)
Step-3
Change settings to make it stand-by (recovery.conf and change
postgresql.conf)
Start Node and verify all data exists.
Step-4
use pg_ctl promote to make Node-2 as master
Step-5
Start Node-1
insert few more records
insert into t1 values(generate_series(110,115),'aaaa');
Step-6
Node-2
Now insert one more records in table t1
insert into t1 values(116,'aaaa');
Stop both the nodes.
Now if I run pg_rewind on old-master(Node-1), it will lead to above error.
I think above scenario can be possible in async replication.
If I insert more records (greater than what I inserted in Step-5)
in Step-6, then pg_rewind works fine.
2.
diff --git a/src/bin/pg_rewind/RewindTest.pm
b/src/bin/pg_rewind/RewindTest.pm
+# To run a test, the test script (in t/ subdirectory) calls the functions
What do you mean by t/ subdirectory?
3.
+ <application>pg_rewind</> was run, and thereforce could not be copied by
typo /thereforce
4.
+static void
+sanityChecks(void)
+{
+ /* Check that there's no backup_label in either cluster */
I could not see such a check in code. Am I missing anything?
5.
+ /*
+ * TODO: move old file out of the way, if any. And perhaps create the
+ * file with temporary name first and rename in place after it's done.
+ */
+ snprintf(BackupLabelFilePath, MAXPGPATH,
+ "%s/backup_label" /* BACKUP_LABEL_FILE */, datadir_target);
+
There are couple of other TODO's in the patch, are these for future?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Beena Emerson | 2015-03-10 06:55:09 | Re: pg_trgm Memory Allocation logic |
Previous Message | Kyotaro HORIGUCHI | 2015-03-10 06:19:31 | Re: Performance improvement for joins where outer side is unique |