Re: Streaming replication, retrying from archive

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, retrying from archive
Date: 2010-01-14 15:23:26
Message-ID: 4B4F36EE.5070401@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Magnus Hagander wrote:
> On Thu, Jan 14, 2010 at 15:36, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Jan 14, 2010 at 9:15 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Imagine this scenario:
>>>
>>> 1. Master is up and running, standby is connected and streaming happily
>>> 2. Network goes down, connection is broken.
>>> 3. Standby falls behind a lot. Old WAL files that the standby needs are
>>> archived, and deleted from master.
>>> 4. Network is restored. Standby reconnects
>>> 5. Standby will get an error because the WAL file it needs is not in the
>>> master anymore.
>>>
>>> What will currently happen is:
>>>
>>> 6, Standby retries connecting and failing indefinitely, until the admin
>>> restarts it.
>>>
>>> What we would *like* to happen is:
>>>
>>> 6. Standby fetches the missing WAL files from archive, then reconnects
>>> and continues streaming.
>>>
>>> Can we fix that?
>> Just MHO here, but this seems like a bigger project than we should be
>> starting at this stage of the game.
>
> +1.
>
> We want this eventually (heck, it'd be awesome!), but let's get what
> we have now stable first.

If we don't fix that within the server, we will need to document that
caveat and every installation will need to work around that one way or
another. Maybe with some monitoring software and an automatic restart. Ugh.

I wasn't really asking if it's possible to fix, I meant "Let's think
about *how* to fix that".

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2010-01-14 15:50:07 archive_timeout behavior for no activity
Previous Message Magnus Hagander 2010-01-14 15:09:43 Re: mailing list archiver chewing patches