Checkpointer crashes with "PANIC: could not fsync file "pg_tblspc/.."

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Checkpointer crashes with "PANIC: could not fsync file "pg_tblspc/.."
Date: 2021-12-21 11:17:23
Message-ID: CAFiTN-szX=ayO80EnSWonBu1YMZrpOr9V0R3BzHBSjMrMPAeMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While testing the below case with the hot standby setup (with the
latest code), I have noticed that the checkpointer process crashed
with the $subject error. As per my observation, we have registered the
SYNC_REQUEST when inserting some tuple into the table, and later on
ALTER SET TABLESPACE we have registered the SYNC_UNLINK_REQUEST, which
looks fine so far, then I have noticed that only when the standby is
connected the underlying table file w.r.t the old tablespace is
already deleted. Now, in AbsorbFsyncRequests we don't do anything for
the SYNC_REQUEST even though we have SYNC_UNLINK_REQUEST for the same
file, but since the underlying file is already deleted the
checkpointer cashed while processing the SYNC_REQUEST.

I have spent some time on this but could not figure out how the
relfilenodenode file w.r.t. to the old tablespace is getting deleted
and if I disconnect the standby then it is not getting deleted, not
sure how walsender is playing a role in deleting the file even before
checkpointer process the unlink request.

postgres[8905]=# create tablespace tab location
'/home/dilipkumar/work/PG/install/bin/test';
CREATE TABLESPACE
postgres[8905]=# create tablespace tab1 location
'/home/dilipkumar/work/PG/install/bin/test1';
CREATE TABLESPACE
postgres[8905]=# create database test tablespace tab;
CREATE DATABASE
postgres[8905]=# \c test
You are now connected to database "test" as user "dilipkumar".
test[8912]=# create table t( a int PRIMARY KEY,b text);
CREATE TABLE
test[8912]=# insert into t values (generate_series(1,10), 'aaa');
INSERT 0 10
test[8912]=# alter table t set tablespace tab1 ;
ALTER TABLE
test[8912]=# CHECKPOINT ;
WARNING: 57P02: terminating connection because of crash of another
server process

log shows:
PANIC: could not fsync file
"pg_tblspc/16384/PG_15_202112131/16386/16387": No such file or
directory

backtrace:
#0 0x00007f2f865ff387 in raise () from /lib64/libc.so.6
#1 0x00007f2f86600a78 in abort () from /lib64/libc.so.6
#2 0x0000000000b13da3 in errfinish (filename=0xcf283f "sync.c", ..
#3 0x0000000000978dc7 in ProcessSyncRequests () at sync.c:439
#4 0x00000000005949d2 in CheckPointGuts (checkPointRedo=67653624,
flags=108) at xlog.c:9590
#5 0x00000000005942fe in CreateCheckPoint (flags=108) at xlog.c:9318
#6 0x00000000008a80b7 in CheckpointerMain () at checkpointer.c:444

Note: This smaller test case is derived from one of the bigger
scenarios raised by Neha Sharma [1]

[1]https://www.postgresql.org/message-id/CANiYTQs0E8TcB11eU0C4eNN0tUd%3DSQqsqEtL1AVZP1%3DEnD-49A%40mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-12-21 12:04:09 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Kyotaro Horiguchi 2021-12-21 11:04:55 Re: In-placre persistance change of a relation