Re: More subtle issues with cascading replication over timeline switches

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: More subtle issues with cascading replication over timeline switches
Date: 2013-01-21 10:51:56
Message-ID: 50FD1DCC.5070608@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19.01.2013 14:26, Amit kapila wrote:
> On Friday, January 18, 2013 5:27 PM Heikki Linnakangas wrote:
>
>
>> Indeed, looking at the pg_xlog, it's not there (I did a couple of extra
>> timeline switches:
>
>> ~/pgsql.master$ ls -l data-master/pg_xlog/
>> total 131084
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000001
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000002
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000003
>> -rw------- 1 heikki heikki 41 Jan 18 13:38 00000002.history
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000003
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000004
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000005
>> -rw------- 1 heikki heikki 83 Jan 18 13:38 00000003.history
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000030000000000000005
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000030000000000000006
>> drwx------ 2 heikki heikki 4096 Jan 18 13:38 archive_status
>> ~/pgsql.master$ ls -l data-standbyB/pg_xlog/
>> total 81928
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000001
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000010000000000000002
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000003
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000020000000000000004
>> -rw------- 1 heikki heikki 83 Jan 18 13:38 00000003.history
>> -rw------- 1 heikki heikki 16777216 Jan 18 13:38 000000030000000000000005
>> drwx------ 2 heikki heikki 4096 Jan 18 13:38 archive_status
>
>> This can be thought of as another variant of the same issue that was
>> fixed by commit 60df192aea0e6458f20301546e11f7673c102101. When standby B
>> scans for the latest timeline, it finds it to be 3, and it reads the
>> timeline history file for 3. After that patch, it also saves it in
>> pg_xlog. It doesn't save the timeline history file for timeline 2,
>> because that's included in the history of timeline 3. However, when
>> standby C connects, it will try to fetch all the history files that it
>> doesn't have, including 00000002.history, which throws the error.
>
> Is the file 00000002.history really required by standby C for any useful purpose?

No, not really.

> Can we think of change in current design such that when standby C connects, even if some old history file (like 00000002.history)
> is not present, it ignores the same and continue.

That would be possible too, with some rejiggering of the code. At the
moment, however, the code to find the latest timeline works by checking
the existence of timeline history files in order. So it first checks for
00000002.history, then 00000003.history, then 00000004.history and so
on, until it gets a file-not-found. That logic doesn't work if there are
gaps in the sequence. So I'm inclined to just make sure the history
files are always copied. I think it's good to have them around anyway,
for debugging purposes.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-01-21 11:23:32 Re: Visual Studio 2012 RC
Previous Message Craig Ringer 2013-01-21 10:02:57 Re: parallel pg_dump