Re: pg_upgrade and rsync

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: pg_upgrade and rsync
Date: 2015-01-23 18:34:57
Message-ID: 54C29451.7020103@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/22/15 7:54 PM, Stephen Frost wrote:
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
>> >On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
>>> > >Or do you - as the text edited in your patch, but not the quote above -
>>> > >mean to run pg_upgrade just on the primary and then rsync?
>> >
>> >No, I was going to run it on both, then rsync.
> I'm pretty sure this is all a lot easier than you believe it to be. If
> you want to recreate what pg_upgrade does to a cluster then the simplest
> thing to do is rsync before removing any of the hard links. rsync will
> simply recreate the same hard link tree that pg_upgrade created when it
> ran, and update files which were actually changed (the catalog tables).
>
> The problem, as mentioned elsewhere, is that you have to checksum all
> the files because the timestamps will differ. You can actually get
> around that with rsync if you really want though- tell it to only look
> at file sizes instead of size+time by passing in --size-only.

What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that had the same timestamp as the original file?

That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww.

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell script instead of directly modifying things itself? Or perhaps some custom "command file" that could then be replayed by pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters for that to work...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-01-23 18:40:55 Re: pg_upgrade and rsync
Previous Message Alvaro Herrera 2015-01-23 18:24:45 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]