Re: Looking for help on backing up a large database (500GB+) with LOBs

From: David Kerr <dmk(at)mr-paradox(dot)net>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Looking for help on backing up a large database (500GB+) with LOBs
Date: 2012-03-11 22:20:10
Message-ID: 4F5D251A.7000905@mr-paradox.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 03/06/2012 02:03 PM, Todd Hunt wrote:
> Hi,
> First let me state that I'm not a DBA, but a developer. I know enough
> about databases to be dangerous, but not enough to make any money at
> it. ;-)
> We are using large objects with OIDs as part of our data model. One
> of our processes is to backup the database every night. Our DBAs told
> me that pg_dump was taking too long. So they decided to shut down
> Postgres, TAR up the directory to a back up location, restart
> Postgres, then copy the back up TAR to tape.
> For our smaller sites, 200GB or less (mostly LOBs), it takes less than
> an hour to shutdown, TAR, and restart (writing to tape is not part of
> this time frame). Some of our larger sites have 500GB+ worth of data,
> which is mostly LOBs. Our DBAs want to move the LOBs out of the
> database and store them on the file system and have the record have a
> path the binary files. I'd like to come up with a better and faster
> back up solution that allows the LOBs to stay in the DB.
> A few things to note is that when a LOB gets inserted into the DB, it
> is never updated. It may be deleted on rare occasions, but never
> updated. Also, the DBAs are against incremental backups and I don't
> blame them, sort of.
> I'm open to any ideas. The servers are pretty standard. They
> initially come with four 1TB hard drives ran in RAID 10, so they have
> 2TB available. There is another controller card and space for 4 more
> drives. We want to keep cost down, but uptime is very important.
> Even though I'm not a Sys Admin either, I was wondering if there would
> be a way to replicate the DB on the two different RAID sets, "halt"
> one to do the backup, then reinitialize it so that it would sync up
> with the other RAID set.
> Other than adding a secondary server to do data replication, I'm open
> to ideas. Even if it means moving the LOBs onto the file system. I
> just need a back up solution that scales well when we exceed 1TB.
> Thank you,
> Todd

Howdy,

With a database that size you probably want to consider performing hot
backups
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html.

That way your system is never down for the reason of backups. Also you
can take full backups on the weekends (or during a known slow time) and
then deltas (your archived logs) daily. This speeds up your backups to a
much more manageable duration.

or, if you're on linux/bsd/etc you could also rsync to another server,
you can configure it to only do deltas so it's not bringing over 500GB
every day.

If you can get a SAN with snapshot capability, you can throw the
database into backup mode, snapshot it, and then take it out, making the
impact of your entire backup take ~5 minutes regardless of size on disk.

All that said, I would advocate for taking the LOBs out of the database.
A database isn't a filesystem, it's never going to match a filesystem +
nginx (or something) for serving up files. The fact that your LOBs are
rarely updated strengthens the case for getting it out of the database.
My philosophy is, let the database handle what it's good at: joins,
transactions, updates, etc. and let filesystems / webservers handle what
they're good at: serving up static content.

Dave

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Rural Hunter 2012-03-12 03:06:04 triggers are not shared between parent and child tables?
Previous Message Noel Simela 2012-03-11 06:07:56 Enabling Windows Authentication