Re: rsync and streaming replication

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: Jean-Armel Luce <jaluce06(at)gmail(dot)com>, "[ADMIN]" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: rsync and streaming replication
Date: 2011-11-16 17:38:16
Message-ID: CAF6yO=3XQurt1RfJwVpaq2Ea_NpQoq9L8ayP4AxPgQ4Vo5fFEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

2011/11/16 Scott Ribe <scott_ribe(at)elevated-dev(dot)com>:
> On Nov 16, 2011, at 10:11 AM, Jean-Armel Luce wrote:
>
>> You are right. I used -a, and I was wanting to be more meaningful so I wrote --all in my post.
>> Please read --archive insterad of --all
>
> Oh, OK. Still seems odd that it took so much longer. Granted, for the files with different timestamps but identical contents, it then syncs them. But it does so by checksumming blocks, comparing checksums, and sending only blocks that are different over the network. Granted, it has to send some checksums over the network, but that's pretty minor traffic. I believe you said you'd seen 125Mb/s over the network? Is that actually accurate? Does the network connection have high latency?
>
> Also, I believe you said -z seemed to slow it down? That has not been my experience at all with rsync'ing pg databases. Between all the values that are stored as plain text, and the redundancies in indexes, I usually see a good speed increase from compressing the data in transit.
>
> I'm certainly glad that you've got a 3x speed increase--that's significant progress. But still, something seems odd about the performance you've reported, so I'm left wondering about what could cause that. Disk performance glitch at one end or the other, network performance, CPU load???

checksum are calculated for 1024 bytes(maybe 2048) per default, for
100GB that makes a huge number of checksum calculated for nothing. It
is possible to increase the block-size (to 8kb for example) for the
checksum but it also increase the risk of false positive (I am not
aware if it is possible to provide the checksum algorithm and size
expected to rsync, or if there can be md5sum collision on 8Kb data).

>
> One thing worth doing I think is to use --stats on every test, so you can see every time how many files and how much data is actually transferred. Also, if you're sitting there watching, sometimes --progress can be informative...
>
> --
> Scott Ribe
> scott_ribe(at)elevated-dev(dot)com
> http://www.elevated-dev.com/
> (303) 722-0567 voice
>
>
>
>
>

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2011-11-16 23:57:57 Re: How and when are encoding DLLs used on Windows?
Previous Message Scott Ribe 2011-11-16 17:23:28 Re: rsync and streaming replication