From: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15346: Replica fails to start after the crash |
Date: | 2018-08-22 15:42:16 |
Message-ID: | CA+q6zcVjv1Lp-3=prBbpq2CbBioK91SHarfw3F8FHuUN4EwcUA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
> On Wed, 22 Aug 2018 at 17:08, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > On 2018-Aug-22, Alexander Kukushkin wrote:
>
> > 2018-08-22 16:44 GMT+02:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
> >
> > >
> > > Sounds likely. I suggest to have a look at what's going on inside the
> > > postmaster process when it gets stuck.
> >
> > Well, it doesn't get stuck, it aborts start with the message:
> > 2018-08-22 14:26:42.073 UTC,,,28485,,5b7d7282.6f45,23,,2018-08-22
> > 14:26:10 UTC,1/0,0,WARNING,01000,"page 179503104 of relation
> > base/18055/212875 does not exist",,,,,"xlog redo at AB3/50323E78 for
> > Btree/DELETE: 182 items",,,,""
> > 2018-08-22 14:26:42.073 UTC,,,28485,,5b7d7282.6f45,24,,2018-08-22
> > 14:26:10 UTC,1/0,0,PANIC,XX000,"WAL contains references to invalid
> > pages",,,,,"xlog redo at AB3/50323E78 for Btree/DELETE: 182
> > items",,,,""
> > 2018-08-22 14:26:42.214 UTC,,,28483,,5b7d7282.6f43,3,,2018-08-22
> > 14:26:10 UTC,,0,LOG,00000,"startup process (PID 28485) was terminated
> > by signal 6: Aborted",,,,,,,,,""
>
> Oh, that's weird ... sounds like the fact that the bgworker starts
> somehow manages to corrupt the list of invalid pages in the startup
> process. That doesn't make any sense ...
We can see that the crash itself happened because in XLogReadBufferExtended at
`if (PageIsNew(page))` (xlogutils.c:512) we've got a page that apparently
wasn't initialized yet, and, since we've reached a consistent state,
log_invalid_page panics.
> ENOTIME for a closer look ATM, though, sorry. Maybe you could try
> running under valgrind?
Could you elaborate please, what can we find using valgrind in this case, some
memory leaks? In any way there is a chance that everything will be ok, since
even just a slow tracing under gdb leads to disappearing of this race
condition.
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2018-08-22 17:50:34 | Re: BUG #15346: Replica fails to start after the crash |
Previous Message | Alvaro Herrera | 2018-08-22 15:08:38 | Re: BUG #15346: Replica fails to start after the crash |
From | Date | Subject | |
---|---|---|---|
Next Message | Bossart, Nathan | 2018-08-22 15:49:16 | Re: Improve behavior of concurrent ANALYZE/VACUUM |
Previous Message | Alvaro Herrera | 2018-08-22 15:08:38 | Re: BUG #15346: Replica fails to start after the crash |