Re: Failed to delete old ReorderBuffer spilled files

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: atorikoshi <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failed to delete old ReorderBuffer spilled files
Date: 2018-01-05 14:53:38
Message-ID: 20180105145338.geiwbicz2t6s67e7@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro wrote:
> On Wed, Nov 22, 2017 at 12:27 AM, atorikoshi
> <torikoshi_atsushi_z2(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > [set_final_lsn_2.patch]
>
> Hi Torikoshi-san,
>
> FYI "make check" in contrib/test_decoding fails a couple of isolation
> tests, one with an assertion failure for my automatic patch tester[1].
> Same result on my laptop:
>
> test ondisk_startup ... FAILED (test process exited with exit code 1)
> test concurrent_ddl_dml ... FAILED (test process exited with exit code 1)
>
> TRAP: FailedAssertion("!(!dlist_is_empty(head))", File:
> "../../../../src/include/lib/ilist.h", Line: 458)

I observed a couple of crashes too a couple of times, while testing this
patch. But I have seen several completely different crashes. This
crash you show I have not been able to reproduce, though I've run this
in 94 and master many times.

For example, I got a backtrace that looks like this in 9.6:

#0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f19ccb913fa in __GI_abort () at abort.c:89
#2 0x000055e7511f451b in errfinish (dummy=<optimized out>)
at /pgsql/source/REL9_6_STABLE/src/backend/utils/error/elog.c:557
#3 0x000055e750ed732b in XLogFileInit (logsegno=1,
use_existent=use_existent(at)entry=0x7ffdbc34ab6f "\001\002", use_lock=use_lock(at)entry=1 '\001')
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:3023
#4 0x000055e750edb227 in XLogWrite (WriteRqst=..., flexible=flexible(at)entry=0 '\000')
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:2258
#5 0x000055e750ee162d in XLogBackgroundFlush ()
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:2894

then in 9.4 I saw this one:

creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... FATAL: could not open directory "pg_logical/snapshots": No such file or directory
STATEMENT: CREATE DATABASE template0;

WARNING: could not remove file or directory "base/12148": No such file or directory
WARNING: some useless files may be left behind in old database directory "base/12148"
FATAL: could not access status of transaction 0
DETAIL: Could not open file "pg_clog/0000": No such file or directory.
child process exited with exit code 1

What this indicates to me is that perhaps the test harness is doing
stupid things such as running two servers concurrently in the same
datadir, so they overwrite one another. If I take out the "-j2" from
make, this no longer reproduces.

Therefore, I'm going to push this patch shortly because clearly this
problem is not its fault.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-05 14:56:37 Re: pgsql: Implement channel binding tls-server-end-point for SCRAM
Previous Message Michael Paquier 2018-01-05 14:28:53 Re: pgsql: Implement channel binding tls-server-end-point for SCRAM