Re: Reliable WAL file shipping over unreliable network

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Rui DeSousa <rui(dot)desousa(at)icloud(dot)com>, scott ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Dianne Skoll <dfs(at)roaringpenguin(dot)com>, pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Reliable WAL file shipping over unreliable network
Date: 2018-03-01 22:40:51
Message-ID: f121e8a2-2321-7478-4759-a02377410994@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 02/03/18 03:21, Rui DeSousa wrote:

>
>
>> On Mar 1, 2018, at 12:21 AM, scott ribe <scott_ribe(at)elevated-dev(dot)com
>> <mailto:scott_ribe(at)elevated-dev(dot)com>> wrote:
>>
>> The false report of success is not good, but it's not the root problem.
>
> A false success if a problem; especially in this use case as the
> source WAL file will be deleted by Postgres before it was truly
> successful.  While monitoring is nice to avoid the issue it is not a
> fix for the issue.
>
> I personally cannot recommend the use of rsync in this application for
> two reasons.
>
> 1. It adds no value; it’s a more complex cp command (no bandwidth
> saved, etc as archive processes a single file at a time).
> 2. It lies on success/failure — Period.
>
>
> I have use “cat” longer than I have used rsync to archive WALs.  I can
> say that I’ve lost zero WAL files using cat; I can not say the same
> for rsync.
>
> The following code is more reliable than rsync and works with across
> multiple platforms and filesystems without fail.
>
> STS=3
>
> OUTPUT=$(cat $XLOGFILE | $SSH_CMD "(mkdir -p $ARCH_DIR && cat >
> $ARCH_DIR/$WALFILE.swap) && mv $ARCH_DIR/$WALFILE.swap
> $ARCH_DIR/$WALFILE")
> if [ $? == 0 ]; then
>    STS=0
> fi
>
> exit $STS
>
>

If you have a self contained case that demonstrates rsync returning 0
when it has actually failed, then please do get the rsync authors
involved in investigating it (I'm sure they would be interested).

Now I've been unable to reproduce any cases of bad return codes or zero
length files (using rsync based archive command + quotas), however I'm
probably not using the same setup as you (and probably a different
platform as well).

regards
Mark

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Rui DeSousa 2018-03-01 22:46:31 Re: Reliable WAL file shipping over unreliable network
Previous Message Ted EH 2018-03-01 22:24:35 Re: WAL segment not replicated