Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hlinnaka(at)iki(dot)fi
Cc: andres(at)anarazel(dot)de, michael(dot)paquier(at)gmail(dot)com, sfrost(at)snowman(dot)net, nag1010(at)gmail(dot)com, jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2018-05-14 06:59:13
Message-ID: 20180514.155913.76739505.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Thank you for adding the detailed comment and commiting.

At Sat, 5 May 2018 01:49:31 +0300, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in <47215279-228d-f30d-35d1-16af695e53f3(at)iki(dot)fi>
> On 04/05/18 10:05, Heikki Linnakangas wrote:
> > On 24/04/18 13:57, Kyotaro HORIGUCHI wrote:
> >> At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas
> >> <hlinnaka(at)iki(dot)fi> wrote in
> >> <89e33d4f-5c75-0738-3dcb-44c4df59e920(at)iki(dot)fi>
> >>> Looking at the patch linked above
> >>> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):
> >>>
> >>>> --- a/src/backend/access/transam/xlog.c
> >>>> +++ b/src/backend/access/transam/xlog.c
> >>>> @@ -11693,6 +11693,10 @@ retry:
> >>>> Assert(reqLen <= readLen);
> >>>> *readTLI = curFileTLI;
> >>>> +
> >>>> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> >>>> readBuf))
> >>>> + goto next_record_is_invalid;
> >>>> +
> >>>> return readLen;
> >>>> next_record_is_invalid:
> >>>
> >>> What is the point of adding this XLogReaderValidatePageHeader() call?
> >>> The caller of this callback function, ReadPageInternal(), checks the
> >>> page header anyway. Earlier in this thread, you said:
> >>
> >> Without the lines, server actually fails to start replication.
> >>
> >> (I try to remember the details...)
> >>
> >> The difference is whether the function can cause retry for the
> >> same portion of a set of continued records (without changing the
> >> plugin API). XLogPageRead can do that. On the other hand all
> >> ReadPageInternal can do is just letting the caller ReadRecords
> >> retry from the beginning of the set of continued records since
> >> the caller side handles only complete records.
> >>
> >> In the failure case, in XLogPageRead, when read() fails, it can
> >> try the next source midst of a continued records. On the other
> >> hand if the segment was read but it was recycled one, it passes
> >> "success" to ReadPageInternal and leads to retring from the
> >> beginning of the recrod. Infinite loop.
> > I see. You have the same problem if you have a WAL file that's corrupt
> > in some other way, but one of the sources would have a correct copy.
> > ValidXLogPageHeader() only checks the page header, after all. But
> > unlike
> > a recycled WAL segment, that's not supposed to happen as part of
> > normal
> > operation, so I guess we can live with that.

Anyway we read successive complete records from different sources
so I think if such curruption causes a problem we would have
faced the same problem even without this.

> Pushed this now, after adding some comments. Thanks!

Thanks!

> >> Calling XLogReaderValidatePageHeader in ReadPageInternal is
> >> redundant, but removing it may be interface change of xlogreader
> >> plugin and I am not sure that is allowed.
> > We should avoid doing that in back-branches, at least. But in
> > 'master',
> > I wouldn't mind redesigning the API. Dealing with all the retrying is
> > pretty complicated as it is, if we can simplify that somehow, I think
> > that'd be good.

Agreed.

> I spent some time musing on what a better API might look like. We
> could remove the ReadPage callback, and instead have XLogReadRecord
> return a special return code to mean "give me more data". I'm thinking
> of something like:

Sounds reasonable. That makes an additional return/call iteration
to XLogReadRecord but it would be ignorable comparing to the cost
of XLogPageRead.

> /* return code of XLogReadRecord() */
> typedef enum
> {
> XLREAD_SUCCESS,
> XLREAD_INVALID_RECORD, /* a record was read, but it was corrupt */
> XLREAD_INVALID_PAGE, /* the page that was supplied looks invalid. */
> XLREAD_NEED_DATA, /* caller should place more data in buffer, and
> retry */
> } XLogReadRecord_Result;
>
>
> And the calls to XLogReadRecord() would look something like this:
>
> for(;;)
> {
> rc = XLogReadRecord(reader, startptr, errormsg);
>
> if (rc == XLREAD_SUCCESS)
> {
> /* great, got record */
> }
> if (rc == XLREAD_INVALID_PAGE || XLREAD_INVALID_RECORD)
> {
> elog(ERROR, "invalid record");
> }
> if (rc == XLREAD_NEED_DATA)
> {
> /*
> * Read a page from disk, and place it into reader->readBuf
> */
> XLogPageRead(reader->readPagePtr, /* page to read */
> reader->reqLen /* # of bytes to read */ );
> /*
> * Now that we have read the data that XLogReadRecord()
> * requested, call it again.
> */
> continue;
> }
> }
>
> So instead of having a callback, XLogReadRecord() would return
> XLREAD_NEED_DATA. The caller would then need to place that data into
> the buffer, and call it again. If a record spans multiple pages,
> XLogReadRecord() would return with XLREAD_NEED_DATA multiple times, to
> read each page.

It seems easier to understand at a glance. In short, I like it.

> The important difference for the bug we're discussing on this thread
> is is that if you passed an invalid page to XLogReadRecord(), it would
> return with XLREAD_INVALID_PAGE. You could then try reading the same
> page from a different source, and call XLogReadRecord() again, and it
> could continue where it was left off, even if it was in the middle of
> a continuation record.

We will have more control on how to continue reading WAL with
this than the current code and it seems preferable as the whole.

> This is clearly not backpatchable, but maybe something to think about
> for v12.

Are you planning to working on this shortly?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-05-14 07:44:22 Re: PANIC during crash recovery of a recently promoted standby
Previous Message Simon Muller 2018-05-14 06:35:34 Re: Allow COPY's 'text' format to output a header

Browse pgsql-bugs by date

  From Date Subject
Next Message Ralf Jung 2018-05-14 09:15:44 "REVOKE ... ON DATABASE template1 ..." has no effect
Previous Message Tom Lane 2018-05-14 05:18:42 Re: Abnormal JSON query performance