Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectly generates WAL segment created during backup

From: Andre Piwoni <apiwoni(at)webmd(dot)net>
To: Maksim Milyutin <milyutinma(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15638: pg_basebackup with --wal-method=stream incorrectly generates WAL segment created during backup
Date: 2019-02-19 18:08:23
Message-ID: CAEC-+VG=OkuqC4S1COP_MitP+nd82AwU+Nydz_ZhTOTuYsErTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

By the way, you can assert this structure, including now explained diff in
WAL segment taken during backup, for fetch vs stream WAL method by creating
master and then creating new slave by taking base backup using both methods:

pg_basebackup -h new_slave -D /var/lib/pgsql/10/data -v -P -R -U
replication --wal-method=stream
pg_basebackup -h new_slave -D /var/lib/pgsql/10/data -v -P -R -U
replication --wal-method=fetch

Please note to see these differences I don't even have to do failover,
however, these structural (.done extension) differences seem to affect
subsequent failover. I seems that promoted slave is utilizing it's own
pg_wal directory with archive status to determine what should be archived
next.

On Tue, Feb 19, 2019 at 9:25 AM Andre Piwoni <apiwoni(at)webmd(dot)net> wrote:

> I call pg_ctl -D /var/lib/pgsql/10/data promote to upgrade slave to master
> when failover happens. archive_mode is set to "on" and not "always".
> I repoint slave to the master by stopping it, updating recovery.conf and
> restarting it. Let me know if I'm doing it wrong.
>
> I think this problem is created before promotion when new slave is created
> using pg_basebackup with --wal-method=stream and manifests when actual
> promotion happens.
> What I'm trying to say it does not seem that .partial extension is the
> issue here but lack of .done extension.
>
> Wouldn't you agree that given directory contents below, when failover
> happens and new slave is promoted to the primary then with fetch mode
> master would should not archive 000000010000000000000002 segment because it
> is marked as .done but in stream method segment is not marked as .done so
> it would be attempted to be archived again? This obviously fails because
> segment has been archived by initial master. If you disagree, can you
> explain the purpose behind .done extension in pg_wal/archive_status
> directory?
>
> Below are contents of directories after base backup but before promoting
> new slave to master status..
>
> PRIMARY
> /var/lib/pgsql/10/data/pg_wal/:
> total 49156
> -rw-------. 1 postgres postgres 16777216 Feb 15 13:39
> 000000010000000000000001
> -rw-------. 1 postgres postgres 16777216 Feb 15 13:40
> 000000010000000000000002
> -rw-------. 1 postgres postgres 302 Feb 15 13:40
> 000000010000000000000002.00000028.backup
> -rw-------. 1 postgres postgres 16777216 Feb 15 13:40
> 000000010000000000000003
> drwx------. 2 postgres postgres 133 Feb 15 13:40 archive_status
>
> /var/lib/pgsql/10/data/pg_wal/archive_status:
> total 0
> -rw-------. 1 postgres postgres 0 Feb 15 13:39
> 000000010000000000000001.done
> -rw-------. 1 postgres postgres 0 Feb 15 13:40
> 000000010000000000000002.00000028.backup.done
> -rw-------. 1 postgres postgres 0 Feb 15 13:40
> 000000010000000000000002.done
>
> WAL ARCHIVE
> -rw-rw-rw-. 1 root root 16777216 Feb 15 13:39 000000010000000000000001
> -rw-rw-rw-. 1 root root 16777216 Feb 15 13:40 000000010000000000000002
> -rw-rw-rw-. 1 root root 302 Feb 15 13:40
> 000000010000000000000002.00000028.backup
>
> NEW SLAVE (stream method)
> -rw-------. 1 postgres root 16777216 Feb 15 13:40
> 000000010000000000000002
> -rw-------. 1 postgres postgres 16777216 Feb 15 13:40
> 000000010000000000000003
> drwx------. 2 postgres root 6 Feb 15 13:40 archive_status
>
> /var/lib/pgsql/10/data/pg_wal/archive_status:
> total 0
>
>
> NEW SLAVE (stream method)
> -rw-------. 1 postgres root 16777216 Feb 15 13:40
> 000000010000000000000002
> -rw-------. 1 postgres postgres 16777216 Feb 15 13:40
> 000000010000000000000003
> drwx------. 2 postgres root 6 Feb 15 13:40 archive_status
>
> /var/lib/pgsql/10/data/pg_wal/archive_status:
> total 0
>
> NEW SLAVE (fetch method)
> -rw-------. 1 postgres root 16777216 Feb 15 13:24
> 000000010000000000000002
> -rw-------. 1 postgres postgres 16777216 Feb 15 13:24
> 000000010000000000000003
> drwx------. 2 postgres root 43 Feb 15 13:24 archive_status
>
> /var/lib/pgsql/10/data/pg_wal/archive_status:
> total 0
> -rw-------. 1 postgres root 0 Feb 15 13:24 000000010000000000000002.done
>
> Eventual failover(s) produced this structure on master:
> /var/lib/pgsql/10/data/pg_wal/archive_status:
> total 0
> -rw-------. 1 postgres postgres 0 Feb 15 14:15
> 000000010000000000000002.ready
>
> Hence attempt to archive again:
> 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command
> was: test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp
> pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002
>
> On Mon, Feb 18, 2019 at 9:03 AM Maksim Milyutin <milyutinma(at)gmail(dot)com>
> wrote:
>
>> On 2/16/19 6:25 PM, Michael Paquier wrote:
>>
>> On Sat, Feb 16, 2019 at 12:26:13AM +0000, PG Bug reporting form wrote:
>>
>> When new slave is created by taking base backup from the primary using
>> pg_basebackup with --wal-method=stream option the WAL file generated during
>> the backup is different (as compared with diff or cmp command) than that on
>> the master and in WAL archive directory. Furthermore, this file does not
>> exist in pg_wal/archive_status with .done extension on new slave, though it
>> exists in pg_wal directory, resulting in failed attempt to archive this file
>> when slave node is promoted as master node.
>> 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was:
>> test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp
>> pg_wal/000000010000000000000002
>> /mnt/pgsql/archive/000000010000000000000002
>>
>> How do you promote your standby? In Postgres 10, the last, partial
>> WAL segment of a past timeline generated at promotion is renamed
>> .partial to avoid any conflicts, so as this should normally not
>> happen if you do not use archive_mode = always.
>>
>>
>> This issue concerns only segment that contains wal records from
>> pg_basebackup. For example, I have reproduced this issue on master branch
>> getting the following content of archive directory after standby promoting:
>>
>> 000000010000000000000001.node1
>> 000000010000000000000002.00000028.backup.node1
>> *000000010000000000000002.node1*
>> *000000010000000000000002.node2*
>> 000000010000000000000003.node1
>> 000000010000000000000004.node1
>> *000000010000000000000005.partial.node2*
>> *000000020000000000000005.node2*
>> 00000002.history.node2
>>
>> Each wal segment specifically was appended with .<node> suffix to
>> distinguish its source. Initially the node1 was master, node2 - standby.
>> The segment 000000010000000000000002 has the following content:
>>
>> rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn:
>> 0/02000028, prev 0/015DB060, desc: RUNNING_XACTS nextXid 474
>> latestCompletedXid 473 oldestRunningXid 474
>> rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn:
>> 0/02000060, prev 0/02000028, desc: CHECKPOINT_ONLINE redo 0/2000028; tli 1;
>> prev tli 1; fpw true; xid 0:474; oid 12690; multi 1; offset 0; oldest xid
>> 467 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid:
>> 0/0; oldest running xid 474; online
>> rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn:
>> 0/020000D0, prev 0/02000060, desc: BACKUP_END 0/2000028
>> rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn:
>> 0/020000F8, prev 0/020000D0, desc: SWITCH
>>
>>
>> On the current state of master branch the content of two segments are the
>> same:
>>
>> $ md5sum 000000010000000000000002.node1 000000010000000000000002.node2
>>
>> 252e3e8cf3a85f218bb467005c565e3c 000000010000000000000002.node1
>> 252e3e8cf3a85f218bb467005c565e3c 000000010000000000000002.node2
>>
>> But on PG 10.6 the contents are differentiated on the tails:
>>
>> $ cmp 000000010000000000000002.node1 000000010000000000000002.node2
>>
>> 000000010000000000000002.node1 000000010000000000000002.node2 differ:
>> byte *131073*, line 1
>>
>> while logical part (prefix) are the same - the lsn of last record
>> (SWITCH) in this segment is 0/02000130 and have length 24 bytes.
>>
>>
>> Is it correct behavior? What would be the correct canonical
>> representation of archive command taking into account this issue?
>>
>>
>> --
>> Regards,
>> Maksim Milyutin
>>
>>
>
> --
>
> *Andre Piwoni*
>
> Sr. Software Developer, BI/Database
>
> *Web*MD Health Services
>
> Mobile: 801.541.4722
>
> www.webmdhealthservices.com
>

--

*Andre Piwoni*

Sr. Software Developer, BI/Database

*Web*MD Health Services

Mobile: 801.541.4722

www.webmdhealthservices.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2019-02-19 20:01:45 Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name
Previous Message Julien Rouhaud 2019-02-19 17:48:04 Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name