Re: Synchronous Log Shipping Replication

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous Log Shipping Replication
Date: 2008-09-10 08:57:14
Message-ID: 1221037034.3913.621.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Wed, 2008-09-10 at 10:06 +0200, Markus Wanner wrote:
> Hi,
>
> Simon Riggs wrote:
> > 1. Standby contacts primary and says it would like to catch up, but is
> > currently at point X (which is a point at, or after the first consistent
> > stopping point in WAL after standby has performed its own crash
> > recovery, if any was required).
> > 2. primary initiates data transfer of old data to standby, starting at
> > point X
> > 3. standby tells primary where it has got to periodically
> > 4. at some point primary decides primary and standby are close enough
> > that it can now begin streaming "current WAL" (which is always the WAL
> > up to wal_buffers behind the the current WAL insertion point).
>
> Hm.. wouldn't it be simpler, to start streaming right away and "cache"

The standby server won't come up until you have:
* copied the base backup
* sent it to standby server
* bring up standby, have it realise it is a replication partner and
begin requesting WAL from primary (in some way)

There will be a gap (probably) between the initial WAL files and the
current tail of wal_buffers by the time all of the above has happened.
We will then need to copy more WAL across until we get to a point where
the most recent WAL record available on standby is ahead of the tail of
wal_buffers on primary so that streaming can start.

If we start caching WAL right away we would need to have two receivers.
One to receive the missing WAL data and one to receive the current WAL
data. We can't apply the WAL until we have the earlier missing WAL data,
so cacheing it seems difficult. On a large server this might be GBs of
data. Seems easier to not cache current WAL and to have just a single
WALReceiver process that performs a mode change once it has caught up.
(And I should say "if it catches up", since it is possible that it never
actually will catch up, in practical terms, since this depends upon the
relative power of the servers involved.). So there's no need to store
more WAL on standby than is required to restart recovery from last
restartpoint. i.e. we stream WAL at all times, not just in normal
running mode.

Seems easiest to have:
* Startup process only reads next WAL record when the ReceivedLogPtr >
ReadRecPtr, so it knows nothing of how WAL is received. Startup process
reads directly from WAL files in *all* cases. ReceivedLogPtr is in
shared memory and accessed via spinlock. Startup process only ever reads
this pointer. (Notice that Startup process is modeless).
* WALReceiver reads data from primary and writes it to WAL files,
fsyncing (if ever requested to do so). WALReceiver updates
ReceivedLogPtr.

That is much simpler and more modular. Buffering of the WAL files is
handled by filesystem buffering.

If standby crashes, all data is safely written to WAL files and we
restart from correct place.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2008-09-10 08:57:33 Re: Synchronous Log Shipping Replication
Previous Message Martijn van Oosterhout 2008-09-10 08:48:20 Re: WIP patch: Collation support