[Patch] Windows relation extension failure at 2GB and 4GB

From: Bryan Green <dbryan(dot)green(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: [Patch] Windows relation extension failure at 2GB and 4GB
Date: 2025-10-28 14:42:11
Message-ID: 0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I found two related bugs in PostgreSQL's Windows port that prevent files
from exceeding 2GB. While unlikely to affect most installations (default
1GB segments), the code is objectively wrong and worth fixing.

The first bug is a pervasive use of off_t where pgoff_t should be used.
On Windows, off_t is only 32-bit, causing signed integer overflow at
exactly 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64
for this purpose and some function declarations in headers already use
it, but the implementations weren't updated to match.

The problem shows up in multiple layers:

In fd.c and fd.h, the VfdCache structure's fileSize field uses off_t, as
do FileSize(), FileTruncate(), and all the File* functions. These are
the core file I/O abstraction layer that everything else builds on.

In md.c, _mdnblocks() uses off_t for its calculations. The actual
arithmetic for computing file offsets is fine - the casts to pgoff_t
work correctly - but passing these values through functions with off_t
parameters truncates them.

In pg_iovec.h, pg_preadv() and pg_pwritev() take off_t offset
parameters, truncating any 64-bit offsets passed from above.

In file_utils.c, pg_pwrite_zeros() takes an off_t offset parameter. This
function is called by FileZero() to extend files with zeros, so it's hit
during relation extension. This was the actual culprit in my testing -
mdzeroextend() would compute a correct 64-bit offset, but it got
truncated to 32-bit (and negative) when passed to pg_pwrite_zeros().

After fixing all those off_t issues, there's a second bug at 4GB in the
Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
win32pread.c. The current implementation uses an OVERLAPPED structure
for positioned I/O, but only sets the Offset field (low 32 bits),
leaving OffsetHigh at zero. This works up to 4GB by accident, but beyond
that, offsets wrap around.

I can reproduce both bugs reliably with --with-segsize=8. The file grows
to exactly 2GB and fails with "could not extend file: Invalid argument"
despite having 300GB free. After fixing the off_t issues, it grows to
exactly 4GB and hits the OVERLAPPED bug. Both are independently verifiable.

The fix touches nine files:
src/include/storage/fd.h - File* function declarations
src/backend/storage/file/fd.c - File* implementations and VfdCache
src/backend/storage/smgr/md.c - _mdnblocks and other functions
src/include/port/pg_iovec.h - pg_preadv/pg_pwritev signatures
src/include/common/file_utils.h - pg_pwrite_zeros declaration
src/common/file_utils.c - pg_pwrite_zeros implementation
src/include/port/win32_port.h - pg_pread/pg_pwrite declarations
src/port/win32pwrite.c - Windows pwrite implementation
src/port/win32pread.c - Windows pread implementation

It's safe for all platforms since pgoff_t equals off_t on Unix where
off_t is already 64-bit. Only Windows behavior changes.

That said, I'm finding off_t used in many other places throughout the
codebase - buffile.c, various other file utilities such as backup and
archive, probably more. This is likely causing latent bugs elsewhere on
Windows, though most are masked by the 1GB default segment size. I'm
investigating the full scope, but I think this needs to be broken up
into multiple patches. The core file I/O layer (fd.c, md.c,
pg_pwrite/pg_pread) should probably go first since that's what's
actively breaking file extension.

Not urgent since few people hit this in practice, but it's clearly wrong
code.
Someone building with larger segments would see failures at 2GB and
potential corruption at 4GB. Windows supports files up to 16 exabytes -
no good reason to limit PostgreSQL to 2GB.

I have attached the patch to fix the relation extension problems for
Windows to this email.

Can provide the other patches that changes off_t for pgoff_t in the rest
of the code if there's interest in fixing this.

To reproduce the bugs on Windows:

1) Build with large segment size: meson setup build
--prefix=C:\pgsql-test -Dsegsize=8.
2) Create a large table and insert data that will make it bigger than 2GB.

CREATE TABLE large_test (
id bigserial PRIMARY KEY,
data1 text,
data2 text,
data3 text
);

INSERT INTO large_test (data1, data2, data3)
SELECT
repeat('A', 300),
repeat('B', 300),
repeat('C', 300)
FROM generate_series(1, 5000000);

SELECT pg_size_pretty(pg_relation_size('large_test'));

You will notice at this point that the first bug surfaces.

3) If you want to reproduce the 2nd bug then you should apply the patch
and then comment out 'overlapped.OffsetHigh = (DWORD) (offset >> 32);'
is win32pwrite.c.
4) Assuming you did 3, do the test in 2 again. If you are watching the
data/base/N/xxxxx file growing you will notice that it gets past 2GB but
now fails at 4GB.

BG

Attachment Content-Type Size
0001-Fix-Windows-file-IO.patch text/plain 21.5 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-10-28 14:57:26 Re: Making pg_rewind faster
Previous Message Jelte Fennema-Nio 2025-10-28 14:39:26 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions