Fix checkpointer PANIC due to missing fsync cancel in mdunlinkfork()

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Fix checkpointer PANIC due to missing fsync cancel in mdunlinkfork()
Date: 2026-04-14 01:55:28
Message-ID: CAHg+QDdH+uBc0deP7uPE5rBkn0EjdkVC2_8ykK3KV90V4z6_5Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I think I found a bug in mdunlinkfork() that causes the checkpointer
to PANIC with "could not fsync file ... No such file or directory" when
a relation is moved between tablespaces and the old tablespace is dropped.

mdunlinkfork() has two code paths for unlinking a relation segment file:
(1) The 'if' branch (isRedo, binary upgrade, non-MAIN forknum, or temp
relations) correctly calls register_forget_request() to cancel any pending
fsync before unlinking. (2) The 'else' branch that handles normal
MAIN_FORKNUM unlink is missing register_forget_request().
This leaves a stale SYNC_REQUEST entry in the checkpointer's
pendingOps hash table.

The comment in ProcessSyncRequests() states below:

/*
* The fsync table could contain requests to fsync segments that
* have been deleted (unlinked) by the time we get to them. Rather
* than just hoping an ENOENT (or EACCES on Windows) error can be
* ignored, what we do on error is absorb pending requests and
* then retry. Since mdunlink() queues a "cancel" message before
* actually unlinking, the fsync request is guaranteed to be
* marked canceled after the absorb if it really was this case.
* DROP DATABASE likewise has to tell us to forget fsync requests
* before it starts deletions.
*/

but this guarantee is not provided by the else branch.

Repro:
Reproducing this is not easy as it is a race but the test below
when repeated sufficient number of times you can see ocasionally
checkpointer PANIC. Though I used TABLESPACE for somewhat
consistent repro, this race exists for operations like DROP table,
TRUNCATE table as well I believe.

CREATE TABLESPACE ts LOCATION '/tmp/ts';
CREATE TABLE t (id int, pad text) TABLESPACE ts;
INSERT INTO t SELECT g, repeat('x', 500) FROM generate_series(1,50000) g;

-- Concurrent updates to dirty shared buffers (background sessions)
UPDATE t SET pad = repeat('y', 500) WHERE id <= 25000;

-- Move table away; FlushRelationBuffers sends SYNC_REQUESTs for old path
ALTER TABLE t SET TABLESPACE pg_default;

-- Drop tablespace forces checkpoint + removes directory
DROP TABLESPACE ts;

-- Next checkpoint hits stale entry -> PANIC
CHECKPOINT;

PANIC: could not fsync file "pg_tblspc/41343/PG_19_202604061/5/41348": No
such file or directory
LOG: checkpointer process (PID 166749) was terminated by signal 6: Aborted
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing

The fix is to add register_forget_request() in the else branch of
mdunlinkfork(), before register_unlink_segment(), matching what the
'if' branch already does.

Thoughts?

Thanks,
Satya

Attachment Content-Type Size
0001-Fix-checkpointer-PANIC-due-to-missing-fsync-cancel-i.patch application/octet-stream 1.2 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-04-14 02:49:38 Re: Bug: Rule actions see wrong values for generated columns (NEW.gen reads OLD value)
Previous Message Chengpeng Yan 2026-04-14 01:34:02 Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types