Re: Fast promotion failure

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: amit(dot)kapila(at)huawei(dot)com
Cc: masao(dot)fujii(at)gmail(dot)com, hlinnakangas(at)vmware(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fast promotion failure
Date: 2013-05-13 00:23:52
Message-ID: 20130513.092352.30755878.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2013/05/10 20:01 "Amit Kapila" <amit(dot)kapila(at)huawei(dot)com>:
> > > C 2013-05-10 15:32:32.170 JST 9242 FATAL: could not receive data
> > from WAL stream:
>
> Is there any chance, that there is any network glitch caused this one time
> error.

Unix domam sockets are hardly likely to have such troubles. This
test ran within single host.

> > I'm get confused, the patch seems to me ensureing the "first
> > checkpoint after fast promotion is performed" to use the
> > "correct, new, ThisTimeLineID".
>
> What is your confusion?

Heikki said in the fist message in this thread that he suspected
the cause of the failure he had seen to be wrong TLI on whitch
checkpointer runs. Nevertheless, the patch you suggested for me
looks fixing it. Moreover (one of?) the failure from the same
cause looks fixed with the patch.

Is the point of this discussion that the patch may leave out some
glich about timing of timeline-related changing and Heikki saw an
egress of that?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jon Nelson 2013-05-13 00:41:26 Re: corrupt pages detected by enabling checksums
Previous Message Robins Tharakan 2013-05-12 23:58:58 Add regression tests for DISCARD