Re: BUG #16331: segfault in checkpointer with full disk

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: jmlich83(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16331: segfault in checkpointer with full disk
Date: 2020-04-01 09:04:55
Message-ID: 20200401090455.GB82418@nol
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On Wed, Apr 01, 2020 at 08:51:56AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 16331
> Logged by: Jozef Mlich
> Email address: jmlich83(at)gmail(dot)com
> PostgreSQL version: 12.2
> Operating system: CentOS
> Description:
>
> I can see segfaults on CentOS 7 with postgresql 12.2-2PGDG.rhel7 (from
> yum.postgresql.org). I am using multiple extensions (cstore, postgres_fdw,
> pgcrypto,dblink, etc.). It seems crash is related to disk run out of space
> (I am using separate partion for / and for /var/lib/pgsql). It occurs few
> times a day. According to backtrace it seems to be related to checkpointer.
> Replication is not configured.
>
>
> [New LWP 26290]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `postgres: checkpointer
> '.
> Program terminated with signal 6, Aborted.
> #0 0x00007fe4604c1207 in __GI_raise (sig=sig(at)entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>
> Thread 1 (Thread 0x7fe462e148c0 (LWP 26290)):
> #0 0x00007fe4604c1207 in __GI_raise (sig=sig(at)entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> resultvar = 0
> pid = 26290
> selftid = 26290
> #1 0x00007fe4604c28f8 in __GI_abort () at abort.c:90
> save_stage = 2
> act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0},
> sa_mask = {__val = {0, 0, 0, 0, 0, 9268713, 70403103920717,
> 39808819211026438, 20126216749056, 70394513997832, 9268713, 70403103920719,
> 17316096998686159616, 20134806683648, 140618848608704, 140618848592800}},
> sa_flags = 1615828275, sa_restorer = 0x0}
> sigs = {__val = {32, 0 <repeats 15 times>}}
> #2 0x000000000087840a in errfinish (dummy=<optimized out>) at elog.c:552
> edata = 0xd47040 <errordata>
> elevel = 22
> oldcontext = 0x171a6d0
> econtext = 0x0
> __func__ = "errfinish"
> #3 0x0000000000706b24 in CheckPointReplicationOrigin () at origin.c:562
> tmppath = 0x9e6fa8 "pg_logical/replorigin_checkpoint.tmp"
> path = 0x9e6fd0 "pg_logical/replorigin_checkpoint"
> tmpfd = <optimized out>
> i = <optimized out>
> magic = 307747550
> crc = 4294967295
> __func__ = "CheckPointReplicationOrigin"

That's not a bug (nor a segfault) but the expected behavior if the checkpointer
is not able to do its work. As data durability can't be guaranteed in such
case, the checkpointer raises a PANIC level message, which raises an abort so
that the whole instance do an emergency restart cycle.

Do you have monitoring for this filesystem? Do you see spikes in disk usage or
other strange behavior?

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jozef Mlich 2020-04-01 09:51:16 Re: BUG #16331: segfault in checkpointer with full disk
Previous Message PG Bug reporting form 2020-04-01 08:51:56 BUG #16331: segfault in checkpointer with full disk