Skip site navigation (1) Skip section navigation (2)

Re: Streaming replication, retrying from archive

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, retrying from archive
Date: 2010-01-15 18:56:41
Message-ID: 1263581801.26654.37627.camel@ebony (view raw or flat)
Thread:
Lists: pgsql-hackers
On Fri, 2010-01-15 at 20:11 +0200, Heikki Linnakangas wrote:

> The states we have at the moment in standby are:
> 
> 1. Archive recovery. Standby fetches WAL files from archive using
> restore_command. When a file is not found in archive, we switch to state 2
> 
> 2. Streaming replication. Standby connects (and reconnects if the
> connection is lost for any reason) to the primary, starts streaming, and
> applies WAL as it arrives. We stay in this state until trigger file is
> found or server is shut down.

> The states with my suggested ReadRecord/FetchRecord refactoring, the
> code I have in the replication-xlogrefactor branch in my git repo, are:
> 
> 1. Initial archive recovery. Standby fetches WAL files from archive
> using restore_command. When a file is not found in archive, we start
> walreceiver and switch to state 2
> 
> 2. Retrying to restore from archive. When the connection to primary is
> established and replication is started, we switch to state 3
> 
> 3. Streaming replication. Connection to primary is established, and WAL
> is applied as it arrives. When the connection is dropped, we go back to
> state 2
> 
> Although the the state transitions between 2 and 3 are a bit fuzzy in
> that version; walreceiver runs concurrently, trying to reconnect, while
> startup process retries restoring from archive. Fujii-san's suggestion
> to have walreceiver stop while startup process retries restoring from
> archive (or have walreceiver run restore_command in approach #2) would
> make that clearer.

The one-way state transitions between 1->2 in both cases seem to make
this a little more complex, rather than more simple. 

If the connection did drop then WAL will be in the archive, so the path
for data is archive->primary->standby. There already needs to be a
network path between archive and standby, so why not drop back from
state 3 -> 1 rather than from 3 -> 2? That way we could have just 2
states on each side, rather than 3.

-- 
 Simon Riggs           www.2ndQuadrant.com


In response to

pgsql-hackers by date

Next:From: Kevin GrittnerDate: 2010-01-15 18:59:06
Subject: Re: Testing with concurrent sessions
Previous:From: Markus WannerDate: 2010-01-15 18:51:36
Subject: Re: Testing with concurrent sessions

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group