Re: pg_upgrade and rsync

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade and rsync
Date: 2015-01-28 22:10:46
Message-ID: 54C95E66.1000309@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce, Stephen, etc.:

So, I did a test trial of this and it seems like it didn't solve the
issue of huge rsyncs.

That is, the only reason to do this whole business via rsync, instead of
doing a new basebackup of each replica, is to cut down on data transfer
time by not resyncing the data from the old base directory. But in
practice, the majority of the database files seem like they get
transmitted anyway. Maybe I'm misreading the rsync ouput?

Here's the setup:

3 Ubuntu 14.04 servers on AWS (tiny instance)
Running PostgreSQL 9.3.5
Set up in cascading replication

108 --> 107 --> 109

The goal was to test this with cascading, but I didn't get that far.

I set up a pgbench workload, read-write on the master and read-only on
the two replicas, to simulate a load-balanced workload. I was *not*
logging hint bits.

I then followed this sequence:

1) Install 9.4 packages on all servers.
2) Shut down the master.
3) pg_upgrade the master using --link
4) shut down replica 107
5) rsync the master's $PGDATA from the replica:

rsync -aHv --size-only -e ssh --itemize-changes
172.31.4.108:/var/lib/postgresql/ /var/lib/postgresql/

... and got:

.d..t...... 9.4/main/pg_xlog/
>f+++++++++ 9.4/main/pg_xlog/0000000700000001000000CB
.d..t...... 9.4/main/pg_xlog/archive_status/

sent 126892 bytes received 408645000 bytes 7640596.11 bytes/sec
total size is 671135675 speedup is 1.64

So that's 390MB of data transfer.

If I look at the original directory:

postgres(at)paul: du --max-depth=1 -h
4.0K ./.cache
20K ./.ssh
424M ./9.3
4.0K ./.emacs.d
51M ./9.4
56K ./bench
474M .

So 390MB were transferred out of a possible 474MB. That certainly seems
like we're still transferring the majority of the data, even though I
verified that the hard links are being sent as hard links. No?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-01-28 22:25:53 Re: PATCH: decreasing memory needlessly consumed by array_agg
Previous Message Jim Nasby 2015-01-28 22:01:47 Re: Providing catalog view to pg_hba.conf file - Patch submission