Re: Streaming replication, retrying from archive

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, retrying from archive
Date: 2010-01-15 18:56:41
Message-ID: 1263581801.26654.37627.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2010-01-15 at 20:11 +0200, Heikki Linnakangas wrote:

> The states we have at the moment in standby are:
>
> 1. Archive recovery. Standby fetches WAL files from archive using
> restore_command. When a file is not found in archive, we switch to state 2
>
> 2. Streaming replication. Standby connects (and reconnects if the
> connection is lost for any reason) to the primary, starts streaming, and
> applies WAL as it arrives. We stay in this state until trigger file is
> found or server is shut down.

> The states with my suggested ReadRecord/FetchRecord refactoring, the
> code I have in the replication-xlogrefactor branch in my git repo, are:
>
> 1. Initial archive recovery. Standby fetches WAL files from archive
> using restore_command. When a file is not found in archive, we start
> walreceiver and switch to state 2
>
> 2. Retrying to restore from archive. When the connection to primary is
> established and replication is started, we switch to state 3
>
> 3. Streaming replication. Connection to primary is established, and WAL
> is applied as it arrives. When the connection is dropped, we go back to
> state 2
>
> Although the the state transitions between 2 and 3 are a bit fuzzy in
> that version; walreceiver runs concurrently, trying to reconnect, while
> startup process retries restoring from archive. Fujii-san's suggestion
> to have walreceiver stop while startup process retries restoring from
> archive (or have walreceiver run restore_command in approach #2) would
> make that clearer.

The one-way state transitions between 1->2 in both cases seem to make
this a little more complex, rather than more simple.

If the connection did drop then WAL will be in the archive, so the path
for data is archive->primary->standby. There already needs to be a
network path between archive and standby, so why not drop back from
state 3 -> 1 rather than from 3 -> 2? That way we could have just 2
states on each side, rather than 3.

--
Simon Riggs www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-01-15 18:59:06 Re: Testing with concurrent sessions
Previous Message Markus Wanner 2010-01-15 18:51:36 Re: Testing with concurrent sessions