Re: Curious buildfarm failures

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Curious buildfarm failures
Date: 2013-01-14 21:50:16
Message-ID: 20130114215016.GA22155@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-01-14 16:35:48 -0500, Tom Lane wrote:
> Since commit 2065dd2834e832eb820f1fbcd16746d6af1f6037, there have been
> a few buildfarm failures along the lines of
>
> -- Commit table drop
> COMMIT PREPARED 'regress-two';
> ! PANIC: failed to re-find shared proclock object
> ! PANIC: failed to re-find shared proclock object
> ! connection to server was lost
>
> Evidently I bollixed something, but what? I've been unable to reproduce
> this locally so far. Anybody see what's wrong?
>
> Another thing is that dugong has been reproducibly failing with
>
> drop cascades to table testschema.atable
> -- Should succeed
> DROP TABLESPACE testspace;
> + ERROR: tablespace "testspace" is not empty
>
> since the elog-doesn't-return patch (b853eb97) went in. Maybe this is
> some local problem there but I'm suspicious that there's a connection.
> But what?
>
> Any insights out there?

It also has:

LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: autovacuum launcher shutting down
LOG: shutting down
FATAL: could not open file "base/16384/28182": No such file or directory
CONTEXT: writing block 6 of relation base/16384/28182
TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1743)
LOG: checkpointer process (PID 30366) was terminated by signal 6: Aborted
LOG: terminating any other active server processes
LOG: abnormal database system shutdown

================== stack trace: pgsql.9958/src/test/regress/tmp_check/data/core ==================
Using host libthread_db library "/lib/tls/libthread_db.so.1".

warning: Can't read pathname for load map: Input/output error.
Core was generated by `postgres: checkpointer process '.
Program terminated with signal 6, Aborted.

#0 0xa000000000010620 in __kernel_syscall_via_break ()
#0 0xa000000000010620 in __kernel_syscall_via_break ()
#1 0x2000000000953bb0 in raise () from /lib/tls/libc.so.6.1
#2 0x2000000000956df0 in abort () from /lib/tls/libc.so.6.1
#3 0x4000000000b4b510 in ExceptionalCondition (
conditionName=0x4000000000d76390 "!(PrivateRefCount[i] == 0)",
errorType=0x4000000000d06500 "FailedAssertion",
fileName=0x4000000000d76260 "bufmgr.c", lineNumber=1743) at assert.c:54
#4 0x40000000007a7d20 in AtProcExit_Buffers (code=1, arg=59) at bufmgr.c:1743
#5 0x40000000007c4e50 in shmem_exit (code=1) at ipc.c:221
#6 0x40000000007c4fa0 in proc_exit_prepare (code=1) at ipc.c:181
#7 0x40000000007c4ab0 in proc_exit (code=1) at ipc.c:96
#8 0x4000000000b5d390 in errfinish (dummy=0) at elog.c:518
#9 0x4000000000823380 in _mdfd_getseg (reln=0x6000000000155420,
forknum=1397792, blkno=6, skipFsync=0 '\0', behavior=EXTENSION_FAIL)
at md.c:577
#10 0x400000000081e5c0 in mdwrite (reln=0x6000000000155420,
forknum=MAIN_FORKNUM, blocknum=6, buffer=0x2000000001432ea0 "",
skipFsync=0 '\0') at md.c:735
#11 0x4000000000824690 in smgrwrite (reln=0x6000000000155420,
forknum=MAIN_FORKNUM, blocknum=6, buffer=0x2000000001432ea0 "",
skipFsync=0 '\0') at smgr.c:534
#12 0x400000000079e510 in FlushBuffer (buf=0x1, reln=0x6000000000155420)
at bufmgr.c:1941
#13 0x40000000007a10b0 in SyncOneBuffer (buf_id=0, skip_recently_used=0 '\0')
at bufmgr.c:1677
#14 0x40000000007a0c00 in CheckPointBuffers (flags=5) at bufmgr.c:1284
#15 0x40000000001fcbf0 in CheckPointGuts (checkPointRedo=80827000, flags=5)
at xlog.c:7391
#16 0x40000000001fb2a0 in CreateCheckPoint (flags=5) at xlog.c:7240
#17 0x40000000001f6820 in ShutdownXLOG (code=14699520,
arg=4611686018440093920) at xlog.c:6823
#18 0x400000000072d780 in _setjmp_lpad_CheckpointerMain_0$0$18 ()
at checkpointer.c:413
#19 0x4000000000235810 in AuxiliaryProcessMain (argc=496536,
argv=0x60000fffff80e520) at bootstrap.c:433
#20 0x40000000007172b0 in StartChildProcess (type=508288) at postmaster.c:4956
#21 0x4000000000713f50 in reaper (postgres_signal_arg=30365)
at postmaster.c:2568
#22 <signal handler called>
#23 0xa000000000010620 in __kernel_syscall_via_break ()
#24 0x2000000000953f70 in sigprocmask () from /lib/tls/libc.so.6.1
#25 0x4000000000720480 in ServerLoop () at postmaster.c:1521
#26 0x400000000071d9d0 in PostmasterMain (argc=6, argv=0x60000000000d85e0)
at postmaster.c:1244
#27 0x4000000000577a30 in main (argc=6, argv=0x60000000000d8010) at main.c:197

in the log. So it seems like it also could be related to locking
changes although I don't immediately see why.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-01-14 21:56:05 Re: [PATCH] COPY .. COMPRESSED
Previous Message Tom Lane 2013-01-14 21:35:48 Curious buildfarm failures