Re: The XLogFindNextRecord() routine find incorrect record start point after a long continuation record

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: The XLogFindNextRecord() routine find incorrect record start point after a long continuation record
Date: 2019-11-06 17:41:29
Message-ID: CAHGQGwH22gP_iUP=hCEONTqS-abKnQ44K+wEwYdur=81Y8pSEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Nov 6, 2019 at 2:07 PM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
>
>
> On 06/11/2019 09:41, Michael Paquier wrote:
> > On Wed, Nov 06, 2019 at 07:40:48AM +0500, Andrey Lepikhov wrote:
> >> I found this in our multimaster project on PostgreSQL 11.5. It is difficult
> >> to reproduce this error, but I will try to do it if necessary.
> >>
> >> The rest of a continuation WAL-record can exactly match the block size. In
> >> this case, we need to switch targetPagePtr to the next block before
> >> calculating the starting point of the next WAL-record.
> >> See the patch in attachment for the bug fix.

Good catch!

> > What's the error you actually saw after reading the record in
> > xlogreader.c? If you have past WAL archives, perhaps you are able to
> > reproduce the problem with a given WAL segment and pg_waldump?
>
> I saw the message:
> pg_waldump: xlogreader.c:264: XLogReadRecord: <Text in russian>
> "((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((8)
> - 1)) & ~((uintptr_t) ((8) - 1))))" <Text in russian>

I created the problematic WAL file artificially by using
pg_logical_emit_message() and sucessfully reproduced
the error. I attached the WAL file that I created. You can
reproduce the issue by

pg_waldump 000000010000000000000008 -s 0/08002028

> Yes, I reproduced error with pg_waldump too. The patch in previous
> letter fixed this problem.

The patch looks good to me. Barrying any objection, I will commit it.
XLogFindNextRecord() must return the valid record starting position,
but currently could return the starting position of WAL page
(not valid WAL record) in the case that you described. This is
the cause of the issue.

Regards,

--
Fujii Masao

Attachment Content-Type Size
000000010000000000000008.tar.bz2 application/x-bzip2 315 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-11-06 21:24:02 Re: PostgreSQL 12 installation fails because locale name contained non-english characters
Previous Message Marcin Cieslak 2019-11-06 17:09:24 Re: [PATCH] 32x32 and 48x48 favicons