Re: Backup/disaster recovery and bandwidth (long)

From: Scott Whitney <scott(at)journyx(dot)com>
To: amador alvarez <aalvarez(at)d2(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Backup/disaster recovery and bandwidth (long)
Date: 2012-04-25 16:53:17
Message-ID: f4ffaad3-f4ca-490d-8096-755c14d630a1@zimbra.int.journyx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Both good points, thanks, although I suspect that a direct network copy of the pg_data directory will be faster than a tar/untar event.

----- Original Message -----

> Hi Scott,
> Why you do not replicate this master to the other location/s using
> other
> methods like bucardo?, you can pick the tables you really want get
> replicated there.
> For the backup turn to hot backup (tar $PGDATA)+ archiving, easier,
> faster and more efficient rather than a logical copy with pgdump.

> A.A

> On 04/25/2012 09:11 AM, Scott Whitney wrote:
> > Hello, everyone. I want to throw a scenario out there to see what
> > y'all think.
> >
> > Soon, my cluster backups will be increasing in size inordinately.
> > They're going to immediately go to 3x as large as they currently
> > are
> > with the potential to be about 20x within a year or so.
> >
> > My current setup uses a single PG 8.x server doing nightly dumps
> > (not
> > ideal but sufficient for the moment, and one of the main reasons to
> > move to PG 9) which are then downloaded from my hosting center to
> > our
> > offices for DR purposes. Each night I pull down roughly 5GB of
> > compressed pg_dump data. Dumping this takes about 1.5hrs.
> > Downloading
> > this at 15Mbps takes about an hour. Soon I'll be looking at
> > somewhere
> > around 7hrs for the dumps to complete and downloading a 12GB file
> > (which will take about 3 hrs). Oh, and I'll have to pay for
> > significant bandwidth overage since I'm charged on a 95%, and while
> > an
> > hour a day does NOT kick me up to 15Mbps usage at 95%, 3hrs per
> > night
> > certainly will, so there's a real cost associated with this
> > strategy
> > as well.
> >
> > While the time of the actual dumps is not a huge issue, the time of
> > the download IS a large concern, especially since my support folks
> > use
> > that file daily to extract individual customer databases for
> > restore
> > in assisting customer support issues.
> >
> > So, while now I have my pg_dumps completed around 2AM and
> > downloaded
> > to my local network at about 3AM, with the increase in our database
> > sizes, what will be happening is that my pg_dump will not be
> > completed
> > until around 7AM, and the download would not be completed until
> > around
> > 10AM, best-case scenario. Add into that support trying to restore a
> > database...more on that in a moment.
> >
> > My _new_ setup will instead be 2 PG 9.x servers with hot-standby
> > enabled (at my hosting center) and a 3rd PG 9.x server at my local
> > office also replicating off of the master. Each one of those
> > servers
> > will perform his own pg_dumps of the individual databases for
> > backup/disaster recovery purposes, and while each dump might not be
> > consistent with one another, each SERVER will have dumps consistent
> > to
> > itself, which is viable for our situation, and does not require me
> > to
> > download 12GB (or more) each night with all of those associated
> > nightmares, costs and other problems.
> >
> > Alright, well, I've got that part all thought out, and it seems
> > like a
> > good way to do it to me, but I'm _still_ running into the situation
> > that I've got to take 8hrs-ish to run the pg_dump no matter where
> > it
> > runs, and when my support folks need it (which they do daily), this
> > basically means that if they have to have a customer database up
> > NOW
> > NOW NOW for support reasons, they simply cannot have it within an
> > hour
> > in many cases. Specifically, one database takes between 2 and
> > 7.5hrs
> > to pg_dump depending on which format I use, so if they need a
> > CURRENT
> > copy, they're at least 4 hours out. Additionally, they can't
> > directly
> > use the replicating server at my local office, because they need to
> > test the problems the customers are having which include pesky
> > things
> > like INSERT, UPDATE and DELETE, so they have to restore this data
> > to
> > another internal PG backend.
> >
> > Enter my outside-the-box thinking.
> >
> > I rather assume that you cannot do a start/stop backup on a
> > hot-standby server. HOWEVER, what if....
> >
> > I set up a 4th database server internally at my office. Each night
> > I
> > stop PG on my 3rd server (the local one replicating off of the
> > master)
> > and rsync my pg_data directory to this new 4th server. I bring up
> > the
> > 4th server NOT as a standby, but as a master. They would then have
> > all
> > customer data on an internal, usable PG system from the time of the
> > rsync, and while it might not reflect the immediate state of the
> > database, that's pretty well always true, and they're used to that,
> > since whenever they "clone" a site, they're using the dumps done
> > around midnight anyway.
> >
> > I believe, then, that when I restart server #3 (the standby who is
> > replicating), he'll say "oh, geez, I was down, let me catch up on
> > all
> > that crap that happened while I was out of the loop," he'll replay
> > the
> > WAL files that were written while he was down, and then he'll catch
> > back up.
> >
> > Does this sound like a viable option? Or does someone have
> > additional
> > suggestions?

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Ribe 2012-04-25 17:23:12 Re: Backup/disaster recovery and bandwidth (long)
Previous Message amador alvarez 2012-04-25 16:32:37 Re: Backup/disaster recovery and bandwidth (long)