Obscure lwlock assertion failure if write fails in initdb

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Obscure lwlock assertion failure if write fails in initdb
Date: 2023-12-13 22:22:42
Message-ID: CA+hUKGJyyp2MGM0ja_QVHwrk0hheqz7pJ_EfEB4iXyMnXwtYkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In all releases, if bootstrap mode's checkpoint gets an error (ENOSPC,
EDQUOT, EIO, ...) or a short write in md.c, ERROR is promoted to FATAL
and the shmem_exit resowner machinery reaches this:

running bootstrap script ... 2023-12-14 10:38:02.320 NZDT [1409162]
FATAL: could not write block 42 in file "base/1/1255": wrote only
4096 of 8192 bytes
2023-12-14 10:38:02.320 NZDT [1409162] HINT: Check free disk space.
2023-12-14 10:38:02.320 NZDT [1409162] CONTEXT: writing block 42 of
relation base/1/1255
TRAP: failed Assert("!LWLockHeldByMe(BufferDescriptorGetContentLock(buf))"),
File: "bufmgr.c", Line: 2409, PID: 1409162

It's really hard to hit because we'd normally expect smgrextend() to
get the error first, and when it does it looks something like this:

running bootstrap script ... 2023-12-14 10:22:41.940 NZDT [1378512]
FATAL: could not extend file "base/1/1255": wrote only 4096 of 8192
bytes at block 42
2023-12-14 10:22:41.940 NZDT [1378512] HINT: Check free disk space.
2023-12-14 10:22:41.940 NZDT [1378512] PANIC: cannot abort
transaction 1, it was already committed
Aborted (core dumped)

A COW system might succeed in smgrextend() and then fail in
smgrwrite(), and any system might fail here with other errno.

It's an extremely well hidden edge case and doesn't matter to users:
initdb failed for lack of space or worse, the message is clear and the
rest is meaningless detail of interest to developers with assertion
builds. I only happened to notice because I've been testing short
write and error scenarios via artificially rigged up means for my
vectored I/O work. No patch, I just wanted to flag this obscure
pre-existing problem spotted in passing.

Browse pgsql-hackers by date

  From Date Subject
Next Message Imseih (AWS), Sami 2023-12-13 22:53:47 Re: [BUG] autovacuum may skip tables when session_authorization/role is set on database
Previous Message Andrew Dunstan 2023-12-13 21:27:12 Re: Remove MSVC scripts from the tree