Skip site navigation (1) Skip section navigation (2)

Re: Synch Rep for CommitFest 2009-07

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synch Rep for CommitFest 2009-07
Date: 2009-07-16 07:53:22
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I think a better way to address that need is to provide a built-in
> mechanism for the standby to request a base backup and have it sent over
> the wire. That makes the initial setup very easy.

Great idea :) 

So I'll reproduce the sketch I did in this other mail, adding the 'base'
state where the prerequisite base backup is handled, that will help
clarify the next points:

 0. base: slave asks the master for a base-backup, at the end of this it
    reaches the base-lsn

 1. init: slave asks the master the current LSN and start streaming WAL

 2. setup: slave asks the master for missing WALs from its base-lsn to
    this LSN it just got, and apply them all to reach initial LSN (this
    happens in parallel to 1.)

 3. catchup: slave has replayed missing WALs and now is replaying the
    stream he received in parallel, and which applies from init LSN
    (just reached)

 4. sync: slave is applying the stream as it gets it, either as part of
    the master transaction or not depending on the GUC settings

> The situation arises also when the standby falls badly behind. A simple
> solution to that is to add a switch in the master to specify "always
> keep X MB of WAL in pg_xlog". The standby will then still find it in
> pg_xlog, making it harder for a standby to fall so much behind that it
> can't find the WAL it needs in the primary anymore. Tom suggested that
> we can just give up and re-sync with a new base backup, but that really
> requires built-in base backup capability, and is only practical for
> small databases.

I think that when the standby is back in business after a connection
glitch (or any other transient error), its current internal state is
still 'sync' and walreceiver asks for next LSN (RedoPTR?). Now, 2 cases
are possible:

 a. primary still has it handy, so the standby is still in sync but
    lagging behind (and primary knows how much)

 b. primary is not able to provide the requested WAL entry, so the slave
    is back to 'setup' state, with base-lsn the point reached just
    before loosing sync (the one walreceiver just asked for).

Now, a standby in 'setup' state isn't ready (yet), and for example
synchronous replication won't be possible in this state: we can't ask
the primary to refuse to COMMIT any transaction (holding it, eg) while a
standby hasn't reached 'sync' state.

The way your talking about the issue make me think there's a mix between
how to handle a lagging standby and an out-of-sync standby. For clarity,
I think we should have very distinct states and responses. And yes, as
Tom and you keep saying, a synced standby by definition should not need
any access to its primary archives. So if it does, it's no more in sync.

> I think we should definitely have both those features, but it's not
> urgent. The replication works without them, although requires that you
> set up traditional archiving as well.

Agreed, it's not essential for the feature as far as hackers are


In response to

pgsql-hackers by date

Next:From: Jaime CasanovaDate: 2009-07-16 07:57:27
Subject: Review: support for multiplexing SIGUSR1
Previous:From: Peter EisentrautDate: 2009-07-16 06:35:15
Subject: Re: Mostly Harmless: c++reserved - patch 1 of 4

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group