| From: | Bryan Green <dbryan(dot)green(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: [Patch] Windows relation extension failure at 2GB and 4GB |
| Date: | 2025-11-06 17:17:52 |
| Message-ID: | 246ff974-8599-4c06-b5d9-7763f12ad9ca@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 11/5/2025 11:05 PM, Michael Paquier wrote:
> On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote:
>> I found two related bugs in PostgreSQL's Windows port that prevent files
>> from exceeding 2GB. While unlikely to affect most installations (default 1GB
>> segments), the code is objectively wrong and worth fixing.
>>
>> The first bug is a pervasive use of off_t where pgoff_t should be used. On
>> Windows, off_t is only 32-bit, causing signed integer overflow at exactly
>> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this
>> purpose and some function declarations in headers already use it, but the
>> implementations weren't updated to match.
>
> Ugh. That's the same problem as "long", which is 8 bytes wide
> everywhere except WIN32. Removing traces of "long" from the code has
> been a continuous effort over the years because of these silent
> overflow issues.
>
>> After fixing all those off_t issues, there's a second bug at 4GB in the
>> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
>> win32pread.c. The current implementation uses an OVERLAPPED structure for
>> positioned I/O, but only sets the Offset field (low 32 bits), leaving
>> OffsetHigh at zero. This works up to 4GB by accident, but beyond that,
>> offsets wrap around.
>>
>> I can reproduce both bugs reliably with --with-segsize=8. The file grows to
>> exactly 2GB and fails with "could not extend file: Invalid argument" despite
>> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB
>> and hits the OVERLAPPED bug. Both are independently verifiable.
>
> The most popular option in terms of installation on Windows is the EDB
> installer, where I bet that a file segment size of 1GB is what's
> embedded in the code compiled. This argument would not hold with WAL
> segment sizes if we begin to support even higher sizes than 1GB at
> some point, and we use pwrite() in the WAL insert code. That should
> not be a problem even in the near future.
>
>> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t
>> is already 64-bit. Only Windows behavior changes.
>
> win32_port.h and port.h say so, yeah.
>
>> Not urgent since few people hit this in practice, but it's clearly wrong
>> code.
>> Someone building with larger segments would see failures at 2GB and
>> potential corruption at 4GB. Windows supports files up to 16 exabytes - no
>> good reason to limit PostgreSQL to 2GB.
>
> The same kind of limitations with 4GB files existed with stat() and
> fstat(), but it was much more complicated than what you are doing
> here, where COPY was not able to work with files larger than 4GB on
> WIN32. See the saga from bed90759fcbc.
>
>> I have attached the patch to fix the relation extension problems for Windows
>> to this email.
>>
>> Can provide the other patches that changes off_t for pgoff_t in the rest of
>> the code if there's interest in fixing this.
>
> Yeah, I think that we should rip out these issues, and move to the
> more portable pgoff_t across the board. I doubt that any of this
> could be backpatched due to the potential ABI breakages these
> signatures changes would cause. Implementing things in incremental
> steps is more sensible when it comes to such changes, as a revert
> blast can be reduced if a portion is incorrectly handled.
>
> I'm seeing as well the things you are pointing in buffile.c. These
> should be fixed as well. The WAL code is less annoying due to the 1GB
> WAL segment size limit, still consistency across the board makes the
> code easier to reason about, at least.
>
> Among the files you have mentioned, there is also copydir.c.
>
> pg_rewind seems also broken with files larger than 4GB, from what I
> can see in libpq_source.c and filemap.c.. Worse. Oops.
>
> - /* Note that this changes the file position, despite not using it. */
> - overlapped.Offset = offset;
> + overlapped.Offset = (DWORD) offset;
> + overlapped.OffsetHigh = (DWORD) (offset >> 32);
>
> Based on the docs at [1], yes, this change makes sense.
>
> It seems to me that a couple of extra code paths should be handled in
> the first patch, and I have spotted three of them. None of them are
> critical as they are related to WAL segments, just become masked and
> inconsistent:
> - xlogrecovery.c, pg_pread() called with a cast to off_t. WAL
> segments have a max size of 1GB, meaning that we're OK.
> - xlogreader.c, pg_pread() with a cast to off_t.
> - walreceiver.c, pg_pwrite().
>
> Except for these three spots, the first patch looks like a cut good
> enough on its own.
>
Latest patch attached that includes these code paths.
> Glad to see someone who takes time to spend time on this kind of
> stuff with portability in mind, by the way.
>
> [1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped
> --
> Michael
Thanks for the quick reviewing.
--
Bryan Green
EDB: https://www.enterprisedb.com
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Fix-Windows-file-IO.patch | text/plain | 23.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2025-11-06 17:45:30 | Re: [Patch] Windows relation extension failure at 2GB and 4GB |
| Previous Message | Álvaro Herrera | 2025-11-06 16:58:50 | Re: warning on the current head |