Re: pgsql: Follow TLI of last replayed record, not recovery target TLI, in

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Follow TLI of last replayed record, not recovery target TLI, in
Date: 2012-12-20 23:23:04
Message-ID: CAHGQGwE9LRNpZxo6m6Hfkc4Nyaw7p5xCRhsx7uKWvjdxSSkaTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On Thu, Dec 20, 2012 at 9:41 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)iki(dot)fi> wrote:
> Follow TLI of last replayed record, not recovery target TLI, in walsenders.
>
> Most of the time, the last replayed record comes from the recovery target
> timeline, but there is a corner case where it makes a difference. When
> the startup process scans for a new timeline, and decides to change recovery
> target timeline, there is a window where the recovery target TLI has already
> been bumped, but there are no WAL segments from the new timeline in pg_xlog
> yet. For example, if we have just replayed up to point 0/30002D8, on
> timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog
> that contains the WAL up to that point. When recovery switches recovery
> target timeline to 2, a walsender can immediately try to read WAL from
> 0/30002D8, from timeline 2, so it will try to open WAL file
> 000000020000000000000003. However, that doesn't exist yet - the startup
> process hasn't copied that file from the archive yet nor has the walreceiver
> streamed it yet, so walsender fails with error "requested WAL segment
> 000000020000000000000003 has already been removed". That's harmless, in that
> the standby will try to reconnect later and by that time the segment is
> already created, but error messages that should be ignored are not good.
>
> To fix that, have walsender track the TLI of the last replayed record,
> instead of the recovery target timeline. That way walsender will not try to
> read anything from timeline 2, until the WAL segment has been created and at
> least one record has been replayed from it. The recovery target timeline is
> now xlog.c's internal affair, it doesn't need to be exposed in shared memory
> anymore.
>
> This fixes the error reported by Thom Brown. depesz the same error message,
> but I'm not sure if this fixes his scenario.

You seem to have forgotten to remove the following line from xlog.h.

src/include/access/xlog.h:312:extern TimeLineID GetRecoveryTargetTLI(void);

Regards,

--
Fujii Masao

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2012-12-21 05:28:45 pgsql: Fix grammatical mistake in error message
Previous Message Tom Lane 2012-12-20 21:32:20 pgsql: Fix pg_extension_config_dump() to handle update cases more sanel