Re: Reliable WAL file shipping over unreliable network

From: Rui DeSousa <rui(dot)desousa(at)icloud(dot)com>
To: Dianne Skoll <dfs(at)roaringpenguin(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Reliable WAL file shipping over unreliable network
Date: 2018-03-01 01:10:59
Message-ID: 21AF0CB4-0873-4960-BA14-E72FA08B352E@icloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin


I’ve tested this and it seems that there is a still a bug in rsync (rsync version 3.1.2 protocol version 31). I used a 1GB archive filesytem to allow for an out of space test case. Not sure of the actual cause as it seems to work a few times; however, it then fails leaving a truncated file and returning a success code.

Example: 00000001000000590000003E - failed 4 fimes and on the fifth try rsync returned success and left a truncated file.

When there is actually no space left; rsync fails immidately and never returns a success code; i.e. 00000001000000590000003F. When freeing up space; archive resumes again. It seems that if rsync is already syncing when the files system fills up then there is a high risk the bug will occur; i.e. 000000010000005A00000001 is also truncated and with a rsync success code.

Since rsync is returning success on a failed sync; even the "-c" option will not help here.

Archive Script critical code:

OUTPUT=$(rsync -ac $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
if [ $? == 0 ]; then
STS=0
echo "Success: $WALFILE" >> /tmp/waltest.log
else
echo "Failed: $WALFILE" >> /tmp/waltest.log
fi

exit $STS

Archive Directory (Note: useing 64MB WALs):

[postgres(at)hades ~/arch/dbc1/wal]$ ls -al
total 1044351
drwxr-xr-x 2 postgres postgres 57 Feb 28 19:46 .
drwxr-xr-x 3 postgres postgres 3 Feb 28 17:33 ..
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000000B
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000000C
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000000D
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000000E
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000000F
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000010
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000011
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000012
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000013
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000014
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000015
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000016
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000017
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000018
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000019
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000001A
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000001B
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000001C
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000001D
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000001E
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000001F
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000020
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000021
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000022
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000023
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000024
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000025
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000026
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000027
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000028
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000029
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000002A
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000002B
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000002C
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000002D
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000002E
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000002F
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000030
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000031
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000032
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000033
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000034
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000035
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000036
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000037
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000038
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005900000039
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000003A
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000003B
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000003C
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000003D
-rw------- 1 postgres postgres 3670016 Feb 28 17:33 00000001000000590000003E
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 00000001000000590000003F
-rw------- 1 postgres postgres 67108864 Feb 28 17:33 000000010000005A00000000
-rw------- 1 postgres postgres 12713984 Feb 28 17:33 000000010000005A00000001

Success/Failure Log:

[postgres(at)hades ~/arch/dbc1/wal]$ cat /tmp/waltest.log
Success: 000000010000005900000008
Success: 000000010000005900000009
Success: 00000001000000590000000A
Success: 00000001000000590000000B
Success: 00000001000000590000000C
Success: 00000001000000590000000D
Success: 00000001000000590000000E
Success: 00000001000000590000000F
Success: 000000010000005900000010
Success: 000000010000005900000011
Success: 000000010000005900000012
Success: 000000010000005900000013
Success: 000000010000005900000014
Success: 000000010000005900000015
Success: 000000010000005900000016
Success: 000000010000005900000017
Success: 000000010000005900000018
Success: 000000010000005900000019
Success: 00000001000000590000001A
Success: 00000001000000590000001B
Success: 00000001000000590000001C
Success: 00000001000000590000001D
Success: 00000001000000590000001E
Success: 00000001000000590000001F
Success: 000000010000005900000020
Success: 000000010000005900000021
Success: 000000010000005900000022
Success: 000000010000005900000023
Success: 000000010000005900000024
Success: 000000010000005900000025
Success: 000000010000005900000026
Success: 000000010000005900000027
Success: 000000010000005900000028
Success: 000000010000005900000029
Success: 00000001000000590000002A
Success: 00000001000000590000002B
Success: 00000001000000590000002C
Success: 00000001000000590000002D
Success: 00000001000000590000002E
Success: 00000001000000590000002F
Success: 000000010000005900000030
Success: 000000010000005900000031
Success: 000000010000005900000032
Success: 000000010000005900000033
Success: 000000010000005900000034
Success: 000000010000005900000035
Success: 000000010000005900000036
Success: 000000010000005900000037
Success: 000000010000005900000038
Success: 000000010000005900000039
Success: 00000001000000590000003A
Success: 00000001000000590000003B
Success: 00000001000000590000003C
Success: 00000001000000590000003D
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Success: 00000001000000590000003E
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Success: 00000001000000590000003F
Success: 000000010000005A00000000
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Success: 000000010000005A00000001
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Dianne Skoll 2018-03-01 01:54:58 Re: Reliable WAL file shipping over unreliable network
Previous Message Andres Freund 2018-02-28 23:49:44 Re: postgresql 9.6 - cannot freeze committed xmax