Re: [BUG] Archive recovery failure on 9.3+.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hlinnakangas(at)vmware(dot)com
Cc: katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] Archive recovery failure on 9.3+.
Date: 2014-02-13 11:37:06
Message-ID: 20140213.203706.132267509.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, I might have been misunderstood your words.

At Thu, 13 Feb 2014 10:11:22 +0200, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote in <52FC7E2A(dot)9060703(at)vmware(dot)com>
> On 02/13/2014 08:44 AM, Kyotaro HORIGUCHI wrote:
> >>>>> Wouldn't it be better to not archive the old segment, and instead
> >>>>> switch
> >>>>> to a new segment after writing the end-of-recovery checkpoint, so that
> >>>>> the segment on the new timeline is archived sooner?
> >>>>
> >>>> It would be better to zero-fill and switch segments, yes. We should
> >>>> NEVER be in a position of archiving two different versions of the same
> >>>> segment.
> >>>
> >>> Ok, I think we're in agreement that that's the way to go for master.

Does this mean that no mechanical solution for this situation
will be given and an operator should remove the older segment for
the same segment id before starting recovoery process?

> > I've almost inclined to that but on some thoughts on the idea,
> > comming to think of recovery upto target timeline, the old
> > segment found to be necessary for the case. Without the old
> > segment, we would be obliged to seek to the first segment of the
> > *next* timeline (Is there any (simple) means to predict where is
> > it?) to complete the task.
>
> How did the server that created the new timeline get the old, partial,
> segment? Was it already archived? Or did the DBA copy it into pg_xlog
> manually? Or was it streamed by streaming replication? Whatever the
> mechanism, the same mechanism ought to make sure the old segment is
> available for PITR, too.

Sure.

> Hmm. If you have set up streaming replication and a WAL archive, and
> your master dies and you fail over to a standby, what you describe
> does happen. The partial old segment is not in the archive, so you
> cannot PITR to a point in the old timeline that falls within the
> partial segment (ie. just before the failover). However, it's not
> guaranteed that all the preceding WAL segments on the old timeline
> were already archived, anyway, so even if the partial segment is
> archived, it's not guaranteed to work.

Yes, and putting aside the insane or vanished segments in
archive, I understand that a pair of master and standby (standby
and cascaded standby and so on, too) can share one WAL archive,
or archived WAL segments and all the WAL segments not archived
and left in pg_xlog of the old master should be merged into WAL
archive of new master (promoted old slave) to keep the
availability of the online backup taken from the old master. Even
with the shared WAL archive, missing segments in archive should
be filled up using pg_xlog though. Nevertheless, the process can
be implemented in automatic way.

The test script at first of this thread is for the case of shared
archive and I have unconsciously put that as the context.

> The old master is responsible for archiving the WAL on the old
> timeline, and the new master is responsible for archiving all the WAL
> on the new timeline. That's a straightforward, easy-to-understand
> rule.

Yes, I was somewhat confused because of my assumption of shared
archive, but it actually can be converged into single archive,
and the older version of PostgreSQL could cope with that
situation.

> It might be useful to have a mode where the standby also
> archives all the received WAL, but that would need to be a separate
> option.

Perhaps such a mechanism is not demanded :)

> > Is it the right way we kick the older one out of archive?
>
> If it's already in the archive, it's not going to be removed from the
> archive.

I have understood the conclusion so far is not archiving the
older segment when promotion but it seems a bit odd story as you
suggested. If the conclusion here is no aid as my new
understanding, would you let me hear the reason why recovery have
changed to prefer archive to pg_xlog?

The commit abf5c5c9a4 seems to change the behavior but I don't
find the reason for the change.

ragards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2014-02-13 12:28:04 Re: gaussian distribution pgbench
Previous Message Magnus Hagander 2014-02-13 11:25:01 Re: Terminating pg_basebackup background streamer