Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
Date: 2009-07-07 19:00:02
Message-ID: 4A539B32.1030405@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> On Tue, Jul 7, 2009 at 4:49 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> This design seems totally wrong to me.
>>> ...
>
>> But this conflicts with earlier discussions where we were concerned
>> about the length of the path wal has to travel between the master and
>> the slaves. We want slaves to be able to be turned on simply using a
>> simple robust configuration and to be able to respond quickly to
>> transactions that are committed in the master for synchronous
>> operation.
>
> Well, the problem I've really got with this is that if you want sync
> replication, couching it in terms of WAL files in the first place seems
> like getting off on fundamentally the wrong foot. That still leaves you
> with all the BS about having to force WAL file switches (and eat LSN
> space) for all sorts of undesirable reasons. I think we want the
> API to operate more like a WAL stream.

I think we all agree on that.

> I would envision the slaves
> connecting to the master's replication port and asking "feed me WAL
> beginning at LSN position thus-and-so", with no notion of WAL file
> boundaries exposed anyplace.

Yep, that's the way I envisioned it to work in my protocol suggestion
that Fujii adopted
(http://archives.postgresql.org/message-id/4951108A.5040608@enterprisedb.com).
The <begin> and <end> values are XLogRecPtrs, not WAL filenames.

>The point about not wanting to archive
> lots of WAL on the master would imply that the master reserves the right
> to fail if the requested starting position is too old, whereupon the
> slave needs some way to resync --- but that probably involves something
> close to taking a fresh base backup to copy to the slave.

Works for me, except that people will want the ability to use a PITR
archive for the catchup, if available. The master should have no
business business peeking into the archive, however. That should be
implemented entirely in the slave.

And I'm sure people will want the option to retain WAL longer in the
master, to avoid an expensive resync if the slave falls behind. It would
be simple to provide a GUC option for "always retain X GB of old WAL in
pg_xlog".

> There are still some interesting questions in this about exactly how you
> switch over from "catchup mode" to following the live WAL broadcast.
> With the above design it would be the master's responsibility to manage
> that, since presumably the requested start position will almost always
> be somewhat behind the live end of WAL. It might be nicer to push that
> complexity to the slave side, but then you do need two data paths
> somehow (ie, retrieving the slightly-stale WAL is separated from
> tracking live events). Which is what you're saying we should avoid,
> and I do see the point there.

Yeah, that logic belongs to the master.

We'll want to send message from the master to the slave when the catchup
is done, so that the slave knows it's up-to-date. For logging, if for no
other reason.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-07-07 19:06:48 Re: Have \d show child tables that inherit from the specified parent
Previous Message Alvaro Herrera 2009-07-07 18:59:36 Re: Maintenance Policy?