Re: Synch Rep for CommitFest 2009-07

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Rick Gigger <rick(at)alpinenetworking(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synch Rep for CommitFest 2009-07
Date: 2009-07-16 19:47:54
Message-ID: 603c8f070907161247y486bba9di8178d9dcf681f367@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 16, 2009 at 1:09 PM, Greg Stark<gsstark(at)mit(dot)edu> wrote:
> On Thu, Jul 16, 2009 at 4:41 PM, Heikki
> Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Rick Gigger wrote:
>>> If you use an rsync like algorithm for doing the base backups wouldn't
>>> that increase the size of the database for which it would still be
>>> practical to just re-sync?  Couldn't you in fact sync a very large
>>> database if the amount of actual change in the files was a small
>>> percentage of the total size?
>>
>> It would certainly help to reduce the network traffic, though you'd
>> still have to scan all the data to see what has changed.
>
> The fundamental problem with pushing users to start over with a new
> base backup is that there's no relationship between the size of the
> WAL and the size of the database.
>
> You can plausibly have a system with extremely high transaction rate
> generating WAL very quickly, but where the whole database fits in a
> few hundred megabytes. In that case you could be behind by only a few
> minutes and have it be faster to take a new base backup.
>
> Or you could have a petabyte database which is rarely updated. In
> which case it might be faster to apply weeks' worth of logs than to
> try to take a base backup.
>
> Only the sysadmin is actually going to know which makes more sense.
> Unless we start tieing WAL parameters to the database size or
> something like that.

I think we need a way for the master to know who its slaves are and
keep any given bit of WAL available until all slaves have succesfully
read it, just as we keep each WAL file until we successfully copy it
to the archive. Otherwise, there's no way to be sure that a
connection break won't result in the need for a new base backup. (In
a way, a slave is very similar to an additional archive.)

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-07-16 19:49:34 Re: navigation menu for documents
Previous Message Greg Stark 2009-07-16 19:37:42 Re: pg_stat_activity.application_name