Re: Switching timeline over streaming replication

From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: "'PostgreSQL-development'" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Switching timeline over streaming replication
Date: 2012-10-01 02:25:17
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C3828535586@szxeml509-mbs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Friday, September 28, 2012 6:38 PM Amit Kapila wrote:
> On Tuesday, September 25, 2012 6:29 PM Heikki Linnakangas wrote:
> On 25.09.2012 10:08, Heikki Linnakangas wrote:
> > On 24.09.2012 16:33, Amit Kapila wrote:
> >> In any case, it will be better if you can split it into multiple
> patches:
> >> 1. Having new functionality of "Switching timeline over streaming
> >> replication"
> >> 2. Refactoring related changes.

> Ok, here you go. xlog-c-split-1.patch contains the refactoring of existing
code, with no user-visible changes.
> streaming-tli-switch-2.patch applies over xlog-c-split-1.patch, and
contains the new functionality.

> Please find the initial review of the patch. Still more review is pending,
> but I thought whatever is done I shall post

Some more review:

11. In function readTimeLineHistory()
ereport(DEBUG3,
(errmsg_internal("history of timeline %u is %s",
targetTLI, nodeToString(result))));

Don't nodeToString(result) needs to be changed as it contain list of structure TimeLineHistoryEntry

12. In function @@ -3768,6 +3773,8 @@ rescanLatestTimeLine(void)
+ * The current timeline must be found in the history file, and the
+ * next timeline must've forked off from it *after* the current
+ * recovery location.
*/
- if (!list_member_int(newExpectedTLIs,
- (int) recoveryTargetTLI))
- ereport(LOG,
- (errmsg("new timeline %u is not a child of database system timeline %u",
- newtarget,
- ThisTimeLineID)));

is there any logic in the current patch which ensures that above check is not require now?

13. In function @@ -3768,6 +3773,8 @@ rescanLatestTimeLine(void)
+ found = false;
+ foreach (cell, newExpectedTLIs)
..
..
- list_free(expectedTLIs);
+ list_free_deep(expectedTLIs);
whats the use of the found variable and freeing expectedTLIs in loop might cause problem.

14. In function @@ -3768,6 +3773,8 @@ rescanLatestTimeLine(void)
newExpectedTLIs = readTimeLineHistory(newtarget);
Shouldn't this variable be declared as newExpectedTLEs as the list returned by readTimeLineHistory contains target list entry

15. StartupXLOG
/* Now we can determine the list of expected TLIs */
expectedTLIs = readTimeLineHistory(recoveryTargetTLI);

Should expectedTLIs be changed to expectedTLEs as the list returned by readTimeLineHistory contains target list entry

16.@@ -5254,8 +5252,8 @@ StartupXLOG(void)
writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
- curFileTLI, endLogSegNo, reason);
+ curFileTLI, EndRecPtr, reason);
if endLogSegNo is not used here, it needs to be removd from function declaration as well.

17.@@ -5254,8 +5252,8 @@ StartupXLOG(void)
if (InArchiveRecovery)
..
..
+
+ /* The history file can be archived immediately. */
+ TLHistoryFileName(histfname, ThisTimeLineID);
+ XLogArchiveNotify(histfname);

Shouldn't this be done archive_mode is on. Right now InArchiveRecovery is true even when we do recovery for standby

18. +static bool
+WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt)
{
..
+ if (XLByteLT(RecPtr, receivedUpto))
+ havedata = true;
+ else
+ {
+ XLogRecPtr latestChunkStart;
+
+ receivedUpto = GetWalRcvWriteRecPtr(&latestChunkStart, &receiveTLI);
+ if (XLByteLT(RecPtr, receivedUpto) && receiveTLI == curFileTLI)
+ {
+ havedata = true;
+ if (!XLByteLT(RecPtr, latestChunkStart))
+ {
+ XLogReceiptTime = GetCurrentTimestamp();
+ SetCurrentChunkStartTime(XLogReceiptTime);
+ }
+ }
+ else
+ havedata = false;
+ }

In the above logic, it seems there is inconsistency in setting havedata = true;
In the above code in else loop, let us say cond. receiveTLI == curFileTLI is false but XLByteLT(RecPtr, receivedUpto) is true,
then in next round in for loop, the check if (XLByteLT(RecPtr, receivedUpto)) will get true and will set havedata = true;
which seems to be contradictory.

19.

+static bool
+WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt)
{
..
+ if (PrimaryConnInfo)
+ {
+ XLogRecPtr ptr = fetching_ckpt ? RedoStartLSN : RecPtr;
+ TimeLineID tli = timelineOfPointInHistory(ptr, expectedTLIs);
+
+ if (tli < curFileTLI)

I think in some cases if (tli < curFileTLI) might not make sense, as for case where curFileTLI =0 for randAccess.

20. Function name WaitForWALToBecomeAvailable suggests that it waits for WAL, but it also returns true when trigger file is present,
which can be little misleading.

21. @@ -2411,27 +2411,6 @@ reaper(SIGNAL_ARGS)

a. won't it impact stop of online basebackup functionality because earlier on promotion
I think this code will stop walsenders and basebackup stop will also give error in such cases.

22. @@ -63,10 +66,17 @@ void
_PG_init(void)
{
/* Tell walreceiver how to reach us */
- if (walrcv_connect != NULL || walrcv_receive != NULL ||
- walrcv_send != NULL || walrcv_disconnect != NULL)
+ if (walrcv_connect != NULL || walrcv_identify_system ||
+ walrcv_readtimelinehistoryfile != NULL ||

check for walrcv_identify_system != NULL is missing.

23. write the function header for newly added functions (libpqrcv_startstreaming, libpqrcv_identify_system, ..)

24. In header of function libpqrcv_receive(), *type needs to be removed.
+ * If data was received, returns the length of the data. *type and *buffer

25.
+timeline_history:
+ K_TIMELINE_HISTORY ICONST
+ {
+ TimeLineHistoryCmd *cmd;
+
+ cmd = makeNode(TimeLineHistoryCmd);
+ cmd->timeline = $2;

can handle invalid timeline error same as for opt_timeline

26.@@ -170,6 +187,7 @@ WalReceiverMain(void)
+ case WALRCV_WAITING:
+ case WALRCV_STREAMING:
/* Shouldn't happen */
elog(PANIC, "walreceiver still running according to shared memory state");
elog message should be changed according to new states.

27.@@ -259,8 +281,11 @@ WalReceiverMain(void)

/* Load the libpq-specific functions */
load_file("libpqwalreceiver", false);
- if (walrcv_connect == NULL || walrcv_receive == NULL ||
- walrcv_send == NULL || walrcv_disconnect == NULL)
+ if (walrcv_connect == NULL || walrcv_startstreaming == NULL ||
+ walrcv_endstreaming == NULL ||
+ walrcv_readtimelinehistoryfile == NULL ||
+ walrcv_receive == NULL || walrcv_send == NULL ||
+ walrcv_disconnect == NULL)

check for walrcv_identify_system is missing.

28.
+/*
+ * Check that we're connected to a valid server using the IDENTIFY_SERVER
+ * replication command, and fetch any timeline history files present in the
+ * master but missing from this server's pg_xlog directory.
+ */
+static void
+WalRcvHandShake(TimeLineID startpointTLI)

In the function header the command name should be IDENTIFY_SYSTEM instead of IDENTIFY_SERVER

29. @@ -170,6 +187,7 @@ WalReceiverMain(void)
+ * timeline. In case we've already reached the end of the old timeline,
+ * the server will finish the streaming immediately, and we will
+ * disconnect. If recovery_target_timeline is 'latest', the startup
+ * process will then pg_xlog and find the new history file, bump recovery

a. I think after reaching at end of old timeline rather than discoonect it will start from new timeline,
which will be updated by startup process.
b. The above line seems to be incorrect, "will then (scan/search) pg_xlog"

30. +static void
+WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)

/*
+ * nudge startup process to notice that we've stopped streaming and are
+ * now waiting for instructions.
+ */
+ WakeupRecovery();
for (;;)
{

In this for loop don't we need to check interrupts or postmaster alive or recovery in progress
so that if any other process continues, it should not wait indefinately.

31.+static void
+WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)

/*
+ * nudge startup process to notice that we've stopped streaming and are
+ * now waiting for instructions.
+ */
+ WakeupRecovery();
for (;;)
{

+ SpinLockAcquire(&walrcv->mutex);
+ if (walrcv->walRcvState == WALRCV_STARTING)
{

I think it can reach WALRCV_STOPPING state also after WALRCV_WAITING from shutdown,
so we should check for that state as well.

32.@@ -170,6 +187,7 @@ WalReceiverMain(void)
{
..
..

+ elog(LOG, "walreceiver ended streaming and awaits new instructions");
+ WalRcvWaitForStartPosition(&startpoint, &startpointTLI);

a. After getting new startpoint and tli, it will go again for WalRcvHandShake(startpointTLI);
so here chances are there, it will again fetch the history files from server which we have
already fetched.
b. Also Identify_system command will run again and get the information such as system identifier,
which is completely redundant at this point.
It fetches tli from primary also which I think can't be changed from what earlier it has fetched.

33. .@@ -170,6 +187,7 @@ WalReceiverMain(void)
+ for (;;)
+ {
+ if (len > 0)
+ XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
+ else if (len == 0)
+ break;
+ else if (len < 0)
+ {
+ ereport(LOG,
+ (errmsg("replication terminated by primary server"),
+ errdetail("End of WAL reached on timeline %u", startpointTLI)));
+ endofwal = true;
+ break;
+ }
+ len = walrcv_receive(0, &buf);
+ }
+
+ /* Let the master know that we received some data. */
+ XLogWalRcvSendReply();
+
+ /*
+ * If we've written some records, flush them to disk and let the
+ * startup process and primary server know about them.
+ */
+ XLogWalRcvFlush(false);

a. In the above code in for loop, when it breaks due to len < 0, there is no need to send reply to master.
b. also when it breaks due to len < 0, there can be 2 reasons, one is end of copy mode or primary server has
disconnected. I think in second case handling should be same as what without this feature.
Not sure if its eventually turning out to be same.

34.
+bool
+WalRcvStreaming(void)
+{

In this function, right now if state is WALRCV_WAITING, then it will return false.
I think waiting is better than starting for the matter of checking if walreceiver is in progress.
or is state WALRCV_WAITING anytime expected when this function is called, if not then we log the
error for invalid state.

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dan Scott 2012-10-01 03:04:24 Re: Extending range of to_tsvector et al
Previous Message Jeff Davis 2012-10-01 02:19:04 Re: WIP checksums patch