Re: Differential backup

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Hannu Krosing" <hannu(at)2ndquadrant(dot)com>, "Csaba Nagy" <ncslists(at)googlemail(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Merlin Moncure" <mmoncure(at)gmail(dot)com>, "Michael Tharp" <gxti(at)partiallystapled(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Differential backup
Date: 2010-04-28 16:28:39
Message-ID: 4BD81BE70200002500030FEA@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> On Tue, 2010-04-27 at 17:28 +0200, Csaba Nagy wrote:

>> One use case we would have is to dump only the changes from the
>> last backup of a single table. This table takes 30% of the DB
>> disk space, it is in the order of ~400GB, and it's only inserted,
>> never updated, then after ~1 year the old entries are archived.
>> There's ~10M new entries daily in this table. If the backup would
>> be smart enough to only read the changed blocks (in this case
>> only for newly inserted records), it would be a fairly big win...

That is covered pretty effectively in PITR-style backups with the
hard link and rsync approach cited earlier in the thread. Those 1GB
table segment files which haven't changed aren't read or written,
and only those portions of the other files which have actually
changed are sent over the wire (although the entire disk file is
written on the receiving end).

> The standard trick for this kind of table is having this table
> partitioned by insertion date

That doesn't always work. In our situation the supreme court sets
records retention rules which can be quite complex, but usually key
on *final disposition* of a case rather than insertion date; that
is, the earliest date on which the data related to a case is
*allowed* to be deleted isn't known until weeks or years after
insertion. Additionally, it is the elected clerk of court in each
county who determines when and if data for that county will be
purged once it has reached the minimum retention threshold set by
supreme court rules.

That's not to say that partitioning couldn't help with some backup
strategies; just that it doesn't solve all "insert-only" (with
eventual purge) use cases. One of the nicest things about
PostgreSQL is the availability of several easy and viable backup
strategies, so that you can tailor one to fit your environment.

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-04-28 16:31:09 Re: Add column if not exists (CINE)
Previous Message Simon Riggs 2010-04-28 16:18:15 Re: pg_start_backup and pg_stop_backup Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct