Re: Streaming replication, retrying from archive

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, retrying from archive
Date: 2010-01-14 22:19:48
Message-ID: 4B4F9884.3090009@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii Masao wrote:
> On Fri, Jan 15, 2010 at 12:23 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> If we don't fix that within the server, we will need to document that
>> caveat and every installation will need to work around that one way or
>> another. Maybe with some monitoring software and an automatic restart. Ugh.
>>
>> I wasn't really asking if it's possible to fix, I meant "Let's think
>> about *how* to fix that".
>
> OK. How about the following (though it's a rough design)?
>
> (1) If walsender cannot read the WAL file because of ENOENT, it sends the
> special message indicating that error to walreceiver. This message is
> shipped on the COPY protocol.
>
> (2-a) If the message arrives, walreceiver exits by using proc_exit().
> (3-a) If the startup process detects the exit of walreceiver in
> WaitNextXLogAvailable(),
> it switches back to a normal archive recovery mode, closes
> the currently opened
> WAL file, resets some variables (readId, readSeg, etc), and
> calls FetchRecord()
> again. Then it tries to restore the WAL file from the
> archive if the restore_command
> is supplied, and switches to a streaming recovery mode again
> if invalid WAL is
> found.
>
> Or
>
> (2-b) If the message arrives, walreceiver executes restore_command,
> and then sets
> the receivedUpto to the end location of the restored WAL
> file. The restored file is
> expected to be filled because it doesn't exist in the
> primary's pg_xlog. So that
> update of the receivedUpto is OK.
> (3-b) After one WAL file is restored, walreceiver tries to connect to
> the primary, and
> starts replication again. If the ENOENT error occurs again,
> we go back to the (1).
>
> I like the latter approach since it's simpler. Thought?

Hmm. Executing restore_command in walreceiver process doesn't feel right
somehow. I'm thinking of:

Let's introduce a new boolean variable in shared memory that the
walreceiver can set to tell startup process if it's connected or
streaming, or disconnected. When startup process sees that walreceiver
is connected, it waits for receivedUpto to advance. Otherwise, it polls
the archive using restore_command.

To actually implement that requires some refactoring of the
ReadRecord/FetchRecord logic in xlog.c. However, it always felt a bit
hacky to me anyway, so that's not necessary a bad thing.

Now, one problem with this is that under the right conditions,
walreceiver might just succeed to reconnect, while the startup process
starts to restore the file from archive. That's OK, the streamed file
will be simply ignored, and the file restored from archive uses a
temporary filename that won't clash with the streamed file, but it feels
a bit strange to have the same file copied to the server via both
mechanisms.

See the "replication-xlogrefactor" branch in my git repository for a
prototype of that. We could also combine that with your 1st design, and
add the special message to indicate "WAL already deleted", and change
the walreceiver restart logic as you suggested. Some restructuring of
Read/FetchRecord is probably required for that anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-01-14 22:21:58 Re: last CommitFest coming up in just under 24 hours
Previous Message Tim Bunce 2010-01-14 22:11:29 Add on_perl_init and proper destruction to plperl [PATCH]