Re: Switching timeline over streaming replication

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: "'PostgreSQL-development'" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Switching timeline over streaming replication
Date: 2012-09-28 13:08:26
Message-ID: 007701cd9d7a$59198cb0$0b4ca610$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Tuesday, September 25, 2012 6:29 PM Heikki Linnakangas wrote:
> On 25.09.2012 10:08, Heikki Linnakangas wrote:
> > On 24.09.2012 16:33, Amit Kapila wrote:
> >> In any case, it will be better if you can split it into multiple
> patches:
> >> 1. Having new functionality of "Switching timeline over streaming
> >> replication"
> >> 2. Refactoring related changes.

> Ok, here you go. xlog-c-split-1.patch contains the refactoring of existing
code, with no user-visible changes.
> streaming-tli-switch-2.patch applies over xlog-c-split-1.patch, and
contains the new functionality.

Please find the initial review of the patch. Still more review is pending,
but I thought whatever is done I shall post

Basic stuff:
----------------------
- Patch applies OK
- Compiles cleanly with no warnings
- Regression tests pass.
- Documentation changes are mostly fine.
- Basic replication tests works.

Testing
---------
Start primary server
Start standby server
Start cascade standby server

Stopped the primary server

Promoted the standby server with ./pg_ctl -D data_repl promote

In postgresql.conf file
archive_mode = off

The following logs are observing in the cascade standby server.

LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions
LOG: record with zero length at 0/17E3888
LOG: re-handshaking at position 0/1000000 on tli 1
LOG: fetching timeline history file for timeline 2 from primary server
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions
LOG: re-handshaking at position 0/1000000 on tli 1
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1

In postgresql.conf file
archive_mode = on

The following logs are observing in the cascade standby server.

LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions
sh:
/home/amit/installation/bin/data_sub/pg_xlog/archive_status/0000000100000000
00000002: No such file or directory
LOG: record with zero length at 0/20144B8
sh:
/home/amit/installation/bin/data_sub/pg_xlog/archive_status/0000000100000000
00000002: No such file or directory
LOG: re-handshaking at position 0/2000000 on tli 1
LOG: fetching timeline history file for timeline 2 from primary server
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions
sh:
/home/amit/installation/bin/data_sub/pg_xlog/archive_status/0000000100000000
00000002: No such file or directory
sh:
/home/amit/installation/bin/data_sub/pg_xlog/archive_status/0000000100000000
00000002: No such file or directory
LOG: re-handshaking at position 0/2000000 on tli 1
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: walreceiver ended streaming and awaits new instructions

Verified that files are present in respective directories.

Code Review
----------------
1. In function readTimeLineHistory(),
two mechanisms are used to fetch timeline from history file
+ sscanf(fline, "%u\t%X/%X", &tli, &switchpoint_hi,
&switchpoint_lo);
+
+ /* expect a numeric timeline ID as first field of line */
+ tli = (TimeLineID) strtoul(ptr, &endptr, 0);
If we use new mechanism, it will not be able to detect error as it is
doing in current case.

2. In function readTimeLineHistory(),
+ fd = AllocateFile(path, "r");
+ if (fd == NULL)
+ {
+ if (errno != ENOENT)
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("could not open file
\"%s\": %m", path)));
+ /* Not there, so assume no parents */
+ return list_make1_int((int) targetTLI);
+ }
still return list_make1_int((int) targetTLI); is used.

3. Function timelineOfPointInHistory(), should return the timeline of recptr
passed to it.
a. is it okay to decide based on xlog recordpointer that which timeline
it belongs to, as different
timelines can have same xlog recordpointer?
b. it seems from logic that it will return timeline previous to the
timeline of recptr passed.
For example if the timeline 3's switchpoint is equal to recptr passed
then it will return timeline 2.

4. In writeTimeLineHistory function variable endTLI is never used.

5. In header of function writeTimeLineHistory(), can give explanation about
XLogRecPtr switchpoint

6. @@ -6869,11 +5947,35 @@ StartupXLOG(void)
*/
if (InArchiveRecovery)
{
+ char reason[200];
+


+ /*
+ * Write comment to history file to explain why and where
timeline
+ * changed. Comment varies according to the recovery target
used.
+ */
+ if (recoveryTarget == RECOVERY_TARGET_XID)
+ snprintf(reason, sizeof(reason),
+ "%s transaction %u",
+ recoveryStopAfter ? "after" :
"before",
+ recoveryStopXid);

In the comment above this line you mentioned why and where timeline changed.

However in the reason field only specifies about where part.

7. + * Returns the redo pointer of the "previous" checkpoint.
+GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
+{
+ if (InRedo)
+ {
+ LWLockAcquire(ControlFileLock, LW_SHARED);
+ *oldrecptr = ControlFile->checkPointCopy.redo;
+ *oldtli = ControlFile->checkPointCopy.ThisTimeLineID;
+ LWLockRelease(ControlFileLock);
+ }

a. In this function, is it required to take ControlFileLock as earlier also
there was no lock to protect this read
when it get called from RestoreArchivedFile, and I think at this point no
one else can modify these values.
However for code consistency purpose like whenever or wherever read the
controlfile values, read it with read lock.

b. As per your comment it should have returned "previous" checkpoint,
however the function returns values of
latest checkpoint.

8. In function writeTimeLineHistoryFile(), will it not be better to directly
write rather than to later do pg_fsync().
as it is just one time write.

9. +XLogRecPtr
+timeLineSwitchPoint(XLogRecPtr startpoint, TimeLineID tli)
..
..
+ * starting point. This is because the client can
legimately
spelling of legitimately needs to be corrected.

10.+XLogRecPtr
+timeLineSwitchPoint(XLogRecPtr startpoint, TimeLineID tli)
..
..
+ if (tli < ThisTimeLineID)
+ {
+ if (!nexttle)
+ elog(ERROR, "could not find history entry for child
of timeline %u", tli); /* shouldn't happen */
+ }

I don't understand the meaning of the above check, as I think this situation
can occur
when this function gets called from StartReplication, because always tli
sent by standby to new
master will be less than ThisTimeLineID and it can be first in list.

Documentation
---------------
1. In explanation of TIMELINE_HISTORY:
Filename of the timeline history file. This is always of the form
[insert correct example here].
Give example.
2. In protocol.sgml change, I feel better explain when the COPYDONE message
will be initiated.

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-09-28 13:33:27 Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Previous Message Andrew Dunstan 2012-09-28 13:03:13 Re: data to json enhancements