Re: BUG #15346: Replica fails to start after the crash

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15346: Replica fails to start after the crash
Date: 2018-08-30 14:03:43
Message-ID: CAFh8B==_Acxr_-K0iRfWdVoAn_pfgzpvu+5AsdaWTEkttwhnnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

2018-08-30 15:39 GMT+02:00 Michael Paquier <michael(at)paquier(dot)xyz>:

> That's indeed obvious by reading the code. The bgwriter would be
> started only once a consistent point has been reached, so the startup
> process would have normally already updated the control file to the
> consistent point. Something like the attached should take care of the
> problem. As the updates of the local copy of minRecoveryPoint strongly
> rely on if the startup process is used, I think that we should use
> InRecovery for the sanity checks.
>
> I'd like to also add a TAP test for that, which should be easy enough if
> we do sanity checks by looking up at the output of the control file.
> I'll try to put more thoughts on that.
>
> Does it take care of the problem?

Yep, with the patch applied bgwriter acts as expected!

Breakpoint 1, UpdateControlFile () at xlog.c:4536
4536 INIT_CRC32C(ControlFile->crc);
(gdb) bt
#0 UpdateControlFile () at xlog.c:4536
#1 0x00005646d071ddb2 in UpdateMinRecoveryPoint (lsn=26341965784,
force=0 '\000') at xlog.c:2597
#2 0x00005646d071de65 in XLogFlush (record=26341965784) at xlog.c:2632
#3 0x00005646d09d693a in FlushBuffer (buf=0x7f8e1ca523c0,
reln=0x5646d2e86028) at bufmgr.c:2729
#4 0x00005646d09d63d6 in SyncOneBuffer (buf_id=99693,
skip_recently_used=1 '\001', wb_context=0x7ffd07757380) at
bufmgr.c:2394
#5 0x00005646d09d6172 in BgBufferSync (wb_context=0x7ffd07757380) at
bufmgr.c:2270
#6 0x00005646d097c266 in BackgroundWriterMain () at bgwriter.c:279
#7 0x00005646d073b38c in AuxiliaryProcessMain (argc=2,
argv=0x7ffd07758840) at bootstrap.c:424
#8 0x00005646d098dc4a in StartChildProcess (type=BgWriterProcess) at
postmaster.c:5300
#9 0x00005646d098d672 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4999
#10 <signal handler called>
#11 0x00007f8e5f5a6ff7 in __GI___select (nfds=5,
readfds=0x7ffd07759060, writefds=0x0, exceptfds=0x0,
timeout=0x7ffd07758fd0) at ../sysdeps/unix/sysv/linux/select.c:41
#12 0x00005646d09890ca in ServerLoop () at postmaster.c:1685
#13 0x00005646d0988799 in PostmasterMain (argc=17,
argv=0x5646d2e53390) at postmaster.c:1329
#14 0x00005646d08d2880 in main (argc=17, argv=0x5646d2e53390) at main.c:228

Regards,
--
Alexander Kukushkin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-08-30 17:34:35 Re: BUG #15346: Replica fails to start after the crash
Previous Message Michael Paquier 2018-08-30 13:39:11 Re: BUG #15346: Replica fails to start after the crash

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-30 14:04:55 Re: Startup cost of sequential scan
Previous Message Tom Lane 2018-08-30 13:59:35 Re: Dimension limit in contrib/cube (dump/restore hazard?)