Re: WAL replay issue from 9.6.8 to 9.6.10

From: Dave Peticolas <dave(at)krondo(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: WAL replay issue from 9.6.8 to 9.6.10
Date: 2018-08-29 16:15:29
Message-ID: CAPRbp07+YXvxY674FcmQVaJnJ=B+12JuAkNG+pkLf0Ph6oZgCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Aug 29, 2018 at 4:54 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Wed, Aug 29, 2018 at 08:31:50AM +0200, Alexander Kukushkin wrote:
> > 2018-08-29 6:02 GMT+02:00 Dave Peticolas <dave(at)krondo(dot)com>:
> >> Hello, I'm seeing some issues with WAL replay on a test server running
> >> 9.6.10 using WAL archived from a 9.6.8 primary server. It reliably
> PANICs
> >> during replay with messages like so:
> >>
> >> WARNING: page 1209270272 of relation base/16422/47496599 does not exist
> >> CONTEXT: xlog redo at 4810/C84F8A0 for Btree/DELETE: 88 items
> >> PANIC: WAL contains references to invalid pages
> >
> >
> > it looks like you are hitting pretty much the same problem as I:
> >
> https://www.postgresql.org/message-id/flat/153492341830.1368.3936905691758473953%40wrigleys.postgresql.org
> > The only major difference, you are restoring from the backup, while in
> > my case the host running replica has crashed.
> > Also in my case, the primary was already running 9.6.10.
> >
> > In my case, it also panics during "Btree/DELETE: XYZ items" and page
> > number of relation is insanely huge.
>
> That would be the same problem. Dave, do you have a background worker
> running in parallel or some read-only workload with backends doing
> read-only operations on a standby once it has reached a consistent
> point?
>
>
Oh, perhaps I do, depending on what you mean by worker. There are a couple
of periodic processes that connect to the server to obtain metrics. Is that
what is triggering this issue? In my case I could probably suspend them
until the replay has reached the desired point.

I have noticed this behavior in the past but prior to 9.6.10 restarting the
server would fix the issue. And the replay always seemed to reach a point
past which the problem would not re-occur.

dave

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message abhishekgautam009 2018-08-29 16:21:03 Unscibscribe
Previous Message David G. Johnston 2018-08-29 15:37:50 Re: Issue with psqlrc with command line.