Re: Streaming replication, retrying from archive

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, retrying from archive
Date: 2010-01-16 10:36:19
Message-ID: m2ljfyqqu4.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Thanks for stating it this way, it really helps figuring out what is it
we're talking about!

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> The states with my suggested ReadRecord/FetchRecord refactoring, the
> code I have in the replication-xlogrefactor branch in my git repo,
> are:

They look like you're trying to solve a specific issue that is a
consequence of another one, without fixing the cause. I hope I'm wrong,
once more :)

> 1. Initial archive recovery. Standby fetches WAL files from archive
> using restore_command. When a file is not found in archive, we start
> walreceiver and switch to state 2
>
> 2. Retrying to restore from archive. When the connection to primary is
> established and replication is started, we switch to state 3

When do the master know about this new slave being there? I'd say not
until 3 is ok, and then, the actual details between 1 and 2 look
strange, partly because it's more about processes than states.

I'd propose to have 1 and 2 started in parallel from the beginning, and
as Simon proposes, being able to get back to 1. at any time:

0. start from a base backup, determine the first WAL / LSN we need to
start streaming, call it SR_LSN. That means asking the master its
current xlog location. The LSN we're at now, after replaying the base
backup and maybe the initial recovery from local WAL files, let's
call it BASE_LSN.

1. Get the missing WAL to get from BASE_LSN to SR_LSN from the archive,
with restore_command, apply them as we receive them, and start
2. possibly in parallel

2. Streaming replication: we connect to the primary and walreceiver gets
the WALs from the connection. It either stores them if current
standby's position < SR_LSN or apply them directly if we were already
streaming.

Local storage would be either standby's archiving or a specific
temporary location. I guess it's more or less what you want to do
with retrying from the master's archives, but I'm not sure your line
of though makes it simpler.

But that's more a process view, not a state view. As 1 and 2 run in
parallel, we're missing some state names. Let's name the states now that
we have the processes.

base: start from a base backup, which we don't know how we got it

catch-up: getting the WALs [from archive] to get from base to being able
to apply the streaming

wanna-sync: receiving primary's wal while not being able to replay them

do-sync: applying the wals we got in wanna-sync state

sync: replaying what's being sent as it arrives

So the current problem is what happens when we're not able to start
streaming from the primary, yet, or again. And your question is how will
it get simpler with all those details.

What I propose is to always have a walreceiver running and getting WALs
from the master. Depending on current state it's applying them (sync) or
keeping them for later (wanna-sync). We need some more code for it to
apply WALs it's been keeping for later (do-sync), that depends on how we
keep the WALs.

Your problem is getting out of catch-up up to sync, and which process is
doing what in between. I hope to make it clear to think about with my
proposal, and would go as far as to say that the startup process does
only care about getting the WALs from BASE_LSN to SR_LSN, that's called
catch-up.

Having another process to handle wanna-sync is neat, but can be
sequential too.

When you lose the connection, you get out of sync back to another state
depending on missing wals, so to know that you need to contact the
primary again.

The master only considers any standby's in sync if its walsender process
is up-to-date or lagging only the last emitted WAL. If lagging more,
that means the standby's is catching up, or replaying more than the
current WAL, so in wanna-sync or do-sync state. Not in sync.

The details about when a slave is in sync will get more important as
soon as we have synchronous streaming.

Regards,
--
dim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-01-16 10:48:50 Re: mailing list archiver chewing patches
Previous Message Fabien COELHO 2010-01-16 09:55:57 Re: missing data in information_schema grant_* tables?