| From: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Subject: | Fix checkpointer PANIC due to missing fsync cancel in mdunlinkfork() |
| Date: | 2026-04-14 01:55:28 |
| Message-ID: | CAHg+QDdH+uBc0deP7uPE5rBkn0EjdkVC2_8ykK3KV90V4z6_5Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
I think I found a bug in mdunlinkfork() that causes the checkpointer
to PANIC with "could not fsync file ... No such file or directory" when
a relation is moved between tablespaces and the old tablespace is dropped.
mdunlinkfork() has two code paths for unlinking a relation segment file:
(1) The 'if' branch (isRedo, binary upgrade, non-MAIN forknum, or temp
relations) correctly calls register_forget_request() to cancel any pending
fsync before unlinking. (2) The 'else' branch that handles normal
MAIN_FORKNUM unlink is missing register_forget_request().
This leaves a stale SYNC_REQUEST entry in the checkpointer's
pendingOps hash table.
The comment in ProcessSyncRequests() states below:
/*
* The fsync table could contain requests to fsync segments that
* have been deleted (unlinked) by the time we get to them. Rather
* than just hoping an ENOENT (or EACCES on Windows) error can be
* ignored, what we do on error is absorb pending requests and
* then retry. Since mdunlink() queues a "cancel" message before
* actually unlinking, the fsync request is guaranteed to be
* marked canceled after the absorb if it really was this case.
* DROP DATABASE likewise has to tell us to forget fsync requests
* before it starts deletions.
*/
but this guarantee is not provided by the else branch.
Repro:
Reproducing this is not easy as it is a race but the test below
when repeated sufficient number of times you can see ocasionally
checkpointer PANIC. Though I used TABLESPACE for somewhat
consistent repro, this race exists for operations like DROP table,
TRUNCATE table as well I believe.
CREATE TABLESPACE ts LOCATION '/tmp/ts';
CREATE TABLE t (id int, pad text) TABLESPACE ts;
INSERT INTO t SELECT g, repeat('x', 500) FROM generate_series(1,50000) g;
-- Concurrent updates to dirty shared buffers (background sessions)
UPDATE t SET pad = repeat('y', 500) WHERE id <= 25000;
-- Move table away; FlushRelationBuffers sends SYNC_REQUESTs for old path
ALTER TABLE t SET TABLESPACE pg_default;
-- Drop tablespace forces checkpoint + removes directory
DROP TABLESPACE ts;
-- Next checkpoint hits stale entry -> PANIC
CHECKPOINT;
PANIC: could not fsync file "pg_tblspc/41343/PG_19_202604061/5/41348": No
such file or directory
LOG: checkpointer process (PID 166749) was terminated by signal 6: Aborted
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing
The fix is to add register_forget_request() in the else branch of
mdunlinkfork(), before register_unlink_segment(), matching what the
'if' branch already does.
Thoughts?
Thanks,
Satya
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Fix-checkpointer-PANIC-due-to-missing-fsync-cancel-i.patch | application/octet-stream | 1.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-04-14 02:49:38 | Re: Bug: Rule actions see wrong values for generated columns (NEW.gen reads OLD value) |
| Previous Message | Chengpeng Yan | 2026-04-14 01:34:02 | Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types |