Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: andres(at)anarazel(dot)de, michael(dot)paquier(at)gmail(dot)com, sfrost(at)snowman(dot)net, nag1010(at)gmail(dot)com, jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2018-05-04 22:49:31
Message-ID: 47215279-228d-f30d-35d1-16af695e53f3@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 04/05/18 10:05, Heikki Linnakangas wrote:
> On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:
>> At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in <89e33d4f-5c75-0738-3dcb-44c4df59e920(at)iki(dot)fi>
>>> Looking at the patch linked above
>>> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):
>>>
>>>> --- a/src/backend/access/transam/xlog.c
>>>> +++ b/src/backend/access/transam/xlog.c
>>>> @@ -11693,6 +11693,10 @@ retry:
>>>> Assert(reqLen <= readLen);
>>>> *readTLI = curFileTLI;
>>>> +
>>>> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
>>>> readBuf))
>>>> + goto next_record_is_invalid;
>>>> +
>>>> return readLen;
>>>> next_record_is_invalid:
>>>
>>> What is the point of adding this XLogReaderValidatePageHeader() call?
>>> The caller of this callback function, ReadPageInternal(), checks the
>>> page header anyway. Earlier in this thread, you said:
>>
>> Without the lines, server actually fails to start replication.
>>
>> (I try to remember the details...)
>>
>> The difference is whether the function can cause retry for the
>> same portion of a set of continued records (without changing the
>> plugin API). XLogPageRead can do that. On the other hand all
>> ReadPageInternal can do is just letting the caller ReadRecords
>> retry from the beginning of the set of continued records since
>> the caller side handles only complete records.
>>
>> In the failure case, in XLogPageRead, when read() fails, it can
>> try the next source midst of a continued records. On the other
>> hand if the segment was read but it was recycled one, it passes
>> "success" to ReadPageInternal and leads to retring from the
>> beginning of the recrod. Infinite loop.
>
> I see. You have the same problem if you have a WAL file that's corrupt
> in some other way, but one of the sources would have a correct copy.
> ValidXLogPageHeader() only checks the page header, after all. But unlike
> a recycled WAL segment, that's not supposed to happen as part of normal
> operation, so I guess we can live with that.

Pushed this now, after adding some comments. Thanks!

>> Calling XLogReaderValidatePageHeader in ReadPageInternal is
>> redundant, but removing it may be interface change of xlogreader
>> plugin and I am not sure that is allowed.
>
> We should avoid doing that in back-branches, at least. But in 'master',
> I wouldn't mind redesigning the API. Dealing with all the retrying is
> pretty complicated as it is, if we can simplify that somehow, I think
> that'd be good.

I spent some time musing on what a better API might look like. We could
remove the ReadPage callback, and instead have XLogReadRecord return a
special return code to mean "give me more data". I'm thinking of
something like:

/* return code of XLogReadRecord() */
typedef enum
{
XLREAD_SUCCESS,
XLREAD_INVALID_RECORD, /* a record was read, but it was corrupt */
XLREAD_INVALID_PAGE, /* the page that was supplied looks invalid. */
XLREAD_NEED_DATA, /* caller should place more data in buffer,
and retry */
} XLogReadRecord_Result;

And the calls to XLogReadRecord() would look something like this:

for(;;)
{
rc = XLogReadRecord(reader, startptr, errormsg);

if (rc == XLREAD_SUCCESS)
{
/* great, got record */
}
if (rc == XLREAD_INVALID_PAGE || XLREAD_INVALID_RECORD)
{
elog(ERROR, "invalid record");
}
if (rc == XLREAD_NEED_DATA)
{
/*
* Read a page from disk, and place it into reader->readBuf
*/
XLogPageRead(reader->readPagePtr, /* page to read */
reader->reqLen /* # of bytes to read */ );
/*
* Now that we have read the data that XLogReadRecord()
* requested, call it again.
*/
continue;
}
}

So instead of having a callback, XLogReadRecord() would return
XLREAD_NEED_DATA. The caller would then need to place that data into the
buffer, and call it again. If a record spans multiple pages,
XLogReadRecord() would return with XLREAD_NEED_DATA multiple times, to
read each page.

The important difference for the bug we're discussing on this thread is
is that if you passed an invalid page to XLogReadRecord(), it would
return with XLREAD_INVALID_PAGE. You could then try reading the same
page from a different source, and call XLogReadRecord() again, and it
could continue where it was left off, even if it was in the middle of a
continuation record.

This is clearly not backpatchable, but maybe something to think about
for v12.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-05-04 23:00:18 Draft release notes are up
Previous Message Mike Blackwell 2018-05-04 22:26:24 Re: perlcritic (was Re: pgsql: Fix precedence problem in new Perl code.)

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-05-05 13:53:47 BUG #15188: wrong count "tuples removed" in autovacuum log file
Previous Message chenhj 2018-05-04 17:16:33 Re:Re: BUG #15187: When use huge page, there may be a lot of hanged connections with status startup or authentication