Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, andres(at)anarazel(dot)de
Cc: michael(dot)paquier(at)gmail(dot)com, sfrost(at)snowman(dot)net, nag1010(at)gmail(dot)com, jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2018-04-23 07:41:47
Message-ID: 89e33d4f-5c75-0738-3dcb-44c4df59e920@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 18/01/18 20:54, Kyotaro HORIGUCHI wrote:
> At Thu, 18 Jan 2018 11:52:52 -0800, Andres Freund <andres(at)anarazel(dot)de> wrote in <20180118195252(dot)hyxgkb3krcqjzfjm(at)alap3(dot)anarazel(dot)de>
>> On 2018-01-18 20:58:27 +0900, Michael Paquier wrote:
>>> b) The second patch that I would like to mention is doing things on the
>>> standby side by being able to change a WAL source when fetching a single
>>> record so as it is possible to fetch the beginning of a cross-segment
>>> record from say an archive or the local xlog, and then request the rest
>>> on the primary. This patch can be found in
>>> https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp
>>> and the diff in WaitForWALToBecomeAvailable() scares me a lot because,
>>> it seems to me that this actually breaks timeline switch
>>> consistency. The concept of switching a WAL source in the middle of a
>>> WAL segment is risky anyway, because we perform sanity checks while
>>> switching sources. The bug we are dealing about here is very rare, and
>>> seeing a problem like that for a timeline switch is even more rare, but
>>> the risk is not zero and we may finish with a corrupted instance, so we
>>> should not in my opinion take this approach. Also this actually achieves
>>> point 2) above, not completely though, but not 1).
>>
>> I don't agree with the sentiment about the approach, but I agree there
>> might be weaknesses in the implementation (which I have not reviewed). I
>> think it's perfectly sensible to require fetching one segment from
>> pg_xlog and the next one via another method. Currently the inability to
>> do so often leads to us constantly refetching the same segment.
>
> Though I'm still not fully confident, if reading a set of
> continuation records from two (or more) sources can lead to
> inconsistency, two successve (complete) records are facing the
> same risk.

This seems like the best approach to me as well.

Looking at the patch linked above
(https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -11693,6 +11693,10 @@ retry:
> Assert(reqLen <= readLen);
>
> *readTLI = curFileTLI;
> +
> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
> + goto next_record_is_invalid;
> +
> return readLen;
>
> next_record_is_invalid:

What is the point of adding this XLogReaderValidatePageHeader() call?
The caller of this callback function, ReadPageInternal(), checks the
page header anyway. Earlier in this thread, you said:

> The rest to do is let XLogPageRead retry other sources
> immediately. To do this I made ValidXLogPageHeader(at)xlogreader(dot)c
> public (and renamed to XLogReaderValidatePageHeader).

but I don't understand that.

- Heikki

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2018-04-23 07:48:38 Re: BUG #15167: error C2365: 'errcode' : redefi nition; previous definition
Previous Message Hao Lee 2018-04-23 06:49:52 Re: BUG #15167: error C2365: 'errcode' : redefi nition; previous definition

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2018-04-23 08:04:52 Re: Toast issues with OldestXmin going backwards
Previous Message Michael Paquier 2018-04-23 07:25:24 Re: BGWORKER_BYPASS_ALLOWCONN used nowhere (infra part of on-line checksum switcher)