Re: WAL logging problem in 9.4.3?

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL logging problem in 9.4.3?
Date: 2016-12-02 04:39:42
Message-ID: CAJrrPGffYzGCLZLfg6Q-RNR5H8ayLjKhYhWOkmfTU9-Af5n+cA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 9, 2016 at 5:55 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

>
>
> On Wed, Nov 9, 2016 at 9:27 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> wrote:
> > On Wed, Nov 9, 2016 at 5:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> >> On Thu, Feb 4, 2016 at 7:24 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
> >>> I dropped the ball on this one back in July, so here's an attempt to
> revive
> >>> this thread.
> >>>
> >>> I spent some time fixing the remaining issues with the prototype patch
> I
> >>> posted earlier, and rebased that on top of current git master. See
> attached.
> >>>
> >>> Some review of that would be nice. If there are no major issues with
> it, I'm
> >>> going to create backpatchable versions of this for 9.4 and below.
> >>
> >> Are you going to do commit something here? This thread and patch are
> >> now 14 months old, which is a long time to make people wait for a bug
> >> fix. The status in the CF is "Ready for Committer" although I am not
> >> sure if that's accurate.
> >
> > "Needs Review" is definitely a better definition of its current state.
> > The last time I had a look at this patch I thought that it was in
> > pretty good shape (not Horiguchi-san's version, but the one in
> > https://www.postgresql.org/message-id/CAB7nPqR+3JjS=JB3R=
> AxxkXCyEB-q77U-ERW7_uKAJCtWNTfrg(at)mail(dot)gmail(dot)com).
> > With some of the recent changes, surely it needs a second look, things
> > related to heap handling tend to rot quickly.
> >
> > I'll look into it once again by the end of this week if Heikki does
> > not show up, the rest will be on him I am afraid...
>
> I have been able to hit a crash with recovery test 008:
> (lldb) bt
> * thread #1: tid = 0x0000, 0x00007fff96d48f06 libsystem_kernel.dylib`__pthread_kill
> + 10, stop reason = signal SIGSTOP
> * frame #0: 0x00007fff96d48f06 libsystem_kernel.dylib`__pthread_kill +
> 10
> frame #1: 0x00007fff9102e4ec libsystem_pthread.dylib`pthread_kill + 90
> frame #2: 0x00007fff8e5cc6df libsystem_c.dylib`abort + 129
> frame #3: 0x0000000106ef10f0 postgres`ExceptionalCondition(conditionName="!((
> !( ((void) ((bool) (! (!((buffer) <= NBuffers && (buffer) >= -NLocBuffer))
> || (ExceptionalCondition(\"!((buffer) <= NBuffers && (buffer) >=
> -NLocBuffer)\", (\"FailedAssertion\"), \"bufmgr.c\", 2593), 0)))), (buffer)
> != 0 ) ? ((bool) 0) : ((buffer) < 0) ? (LocalRefCount[-(buffer) - 1] > 0) :
> (GetPrivateRefCount(buffer) > 0) ))", errorType="FailedAssertion",
> fileName="bufmgr.c", lineNumber=2593) + 128 at assert.c:54
> frame #4: 0x0000000106cf4a2c postgres`BufferGetBlockNumber(buffer=0)
> + 204 at bufmgr.c:2593
> frame #5: 0x000000010694e6ad postgres`HeapNeedsWAL(rel=0x00007f9454804118,
> buf=0) + 61 at heapam.c:9234
> frame #6: 0x000000010696d8bd postgres`visibilitymap_set(rel=0x00007f9454804118,
> heapBlk=1, heapBuf=0, recptr=50841176, vmBuf=118, cutoff_xid=866,
> flags='\x01') + 989 at visibilitymap.c:310
> frame #7: 0x000000010695d020 postgres`heap_xlog_visible(record=0x00007f94520035d0)
> + 896 at heapam.c:8148
> frame #8: 0x000000010695c582 postgres`heap2_redo(record=0x00007f94520035d0)
> + 242 at heapam.c:9107
> frame #9: 0x00000001069d132d postgres`StartupXLOG + 9181 at xlog.c:6950
> frame #10: 0x0000000106c9d783 postgres`StartupProcessMain + 339 at
> startup.c:216
> frame #11: 0x00000001069ee6ec postgres`AuxiliaryProcessMain(argc=2,
> argv=0x00007fff59316d80) + 1676 at bootstrap.c:420
> frame #12: 0x0000000106c98002 postgres`StartChildProcess(type=StartupProcess)
> + 322 at postmaster.c:5221
> frame #13: 0x0000000106c96031 postgres`PostmasterMain(argc=3,
> argv=0x00007f9451c04210) + 6033 at postmaster.c:1301
> frame #14: 0x0000000106bc30cf postgres`main(argc=3,
> argv=0x00007f9451c04210) + 751 at main.c:228
> (lldb) up 1
> frame #4: 0x0000000106cf4a2c postgres`BufferGetBlockNumber(buffer=0) +
> 204 at bufmgr.c:2593
> 2590 {
> 2591 BufferDesc *bufHdr;
> 2592
> -> 2593 Assert(BufferIsPinned(buffer));
> 2594
> 2595 if (BufferIsLocal(buffer))
> 2596 bufHdr = GetLocalBufferDescriptor(-buffer - 1);
>

The latest proposed patch still having problems.
Closed in 2016-11 commitfest with "moved to next CF" status because of a
bug fix patch.
Please feel free to update the status once you submit the updated patch.

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2016-12-02 04:44:05 Re: asynchronous and vectorized execution
Previous Message Haribabu Kommi 2016-12-02 04:36:37 Re: pg_xlogdump follow into the future