Re: Hard link backup strategy

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Michael Monnerie" <michael(dot)monnerie(at)is(dot)it-management(dot)at>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Hard link backup strategy
Date: 2009-03-30 17:20:43
Message-ID: 49D0B91B.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Michael Monnerie <michael(dot)monnerie(at)is(dot)it-management(dot)at> wrote:
> On Donnerstag 26 März 2009 Kevin Grittner wrote:
>> (1) Our archive script copies WAL files to a directory on the
>> database server, using cp to a one directory followed by mv to
>> another (to prevent partial files from being processed).
>
> What "partial files" do you speak about? I'd like to know, as we
> soon start doing WAL copies.

I don't want anything to attempt to process a WAL file while it is in
the process of being copied. By copying to a separate directory on
the same mount point and moving it, once complete, to the location
where other software is looking for WAL files, we avoid that problem.

> You mean you make the hard link both on client and server, and
> afterwards transfer with rsync? That only works for postgresql
> <8.3, right? As I understand, 8.3 will reuse the space from updated
> tuples and therefore lots of changes in-between will be done.

PostgreSQL version has nothing to do with it. PostgreSQL segments a
table into 1GB files. Our largest tables are are either "insert only"
or rarely have updates or deletes, so many of these table segment
files are unchanged from one base backup to the next. What we're
trying to accomplish is to use Linux hard links to create directory
entries from multiple base backups which point to the same file body.
Do a web search for "linux hard links" if this doesn't make sense to
you.

The benefit of rsync is that, when you use a daemon as we do, it won't
send unchanged portions of a file over the wire. In preliminary tests
our largest county only would only send about 10% of the full set of
data if we used rsync on a snapshot of the original data directory,
without using cpio or gzip first. I'm hoping that if we use the warm
standby image as the base, rather than the previous week's backup, it
will be a fraction of that.

Basically, the hard links will be used to conserve disk space; rsync
will be used to conserve network bandwidth.

-Kevin

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Félix Sánchez Rodríguez 2009-03-30 22:01:09 Data type to store files
Previous Message Guillaume Lelarge 2009-03-30 15:40:04 Re: copy command and column attribute