Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, hlinnaka(at)iki(dot)fi
Cc: andres(at)anarazel(dot)de, michael(dot)paquier(at)gmail(dot)com, sfrost(at)snowman(dot)net, nag1010(at)gmail(dot)com, jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2018-05-04 07:05:41
Message-ID: 28f5a502-8f27-94af-6d5a-ed939be87125@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:
> At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in <89e33d4f-5c75-0738-3dcb-44c4df59e920(at)iki(dot)fi>
>> Looking at the patch linked above
>> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):
>>
>>> --- a/src/backend/access/transam/xlog.c
>>> +++ b/src/backend/access/transam/xlog.c
>>> @@ -11693,6 +11693,10 @@ retry:
>>> Assert(reqLen <= readLen);
>>> *readTLI = curFileTLI;
>>> +
>>> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
>>> readBuf))
>>> + goto next_record_is_invalid;
>>> +
>>> return readLen;
>>> next_record_is_invalid:
>>
>> What is the point of adding this XLogReaderValidatePageHeader() call?
>> The caller of this callback function, ReadPageInternal(), checks the
>> page header anyway. Earlier in this thread, you said:
>
> Without the lines, server actually fails to start replication.
>
> (I try to remember the details...)
>
> The difference is whether the function can cause retry for the
> same portion of a set of continued records (without changing the
> plugin API). XLogPageRead can do that. On the other hand all
> ReadPageInternal can do is just letting the caller ReadRecords
> retry from the beginning of the set of continued records since
> the caller side handles only complete records.
>
> In the failure case, in XLogPageRead, when read() fails, it can
> try the next source midst of a continued records. On the other
> hand if the segment was read but it was recycled one, it passes
> "success" to ReadPageInternal and leads to retring from the
> beginning of the recrod. Infinite loop.

I see. You have the same problem if you have a WAL file that's corrupt
in some other way, but one of the sources would have a correct copy.
ValidXLogPageHeader() only checks the page header, after all. But unlike
a recycled WAL segment, that's not supposed to happen as part of normal
operation, so I guess we can live with that.

> Calling XLogReaderValidatePageHeader in ReadPageInternal is
> redundant, but removing it may be interface change of xlogreader
> plugin and I am not sure that is allowed.

We should avoid doing that in back-branches, at least. But in 'master',
I wouldn't mind redesigning the API. Dealing with all the retrying is
pretty complicated as it is, if we can simplify that somehow, I think
that'd be good.

- Heikki

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-05-04 10:24:23 BUG #15186: how get data from db files
Previous Message Andrew Gierth 2018-05-03 15:40:42 Re: BUG #15184: Planner overestimates number of rows in empty table

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2018-05-04 07:37:31 Re: Optimze usage of immutable functions as relation
Previous Message Aleksandr Parfenov 2018-05-04 05:42:44 Optimze usage of immutable functions as relation