| From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
|---|---|
| To: | Bryan Green <dbryan(dot)green(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: [Patch] Windows relation extension failure at 2GB and 4GB |
| Date: | 2025-11-06 09:20:09 |
| Message-ID: | CA+hUKGJFAC2Cz=hqaoK2SOyBYqXvGy0JLyAWffe-XNBgJmniLA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Oct 29, 2025 at 3:42 AM Bryan Green <dbryan(dot)green(at)gmail(dot)com> wrote:
> That said, I'm finding off_t used in many other places throughout the
> codebase - buffile.c, various other file utilities such as backup and
> archive, probably more. This is likely causing latent bugs elsewhere on
> Windows, though most are masked by the 1GB default segment size. I'm
> investigating the full scope, but I think this needs to be broken up
> into multiple patches. The core file I/O layer (fd.c, md.c,
> pg_pwrite/pg_pread) should probably go first since that's what's
> actively breaking file extension.
The way I understand this situation, there are two kinds of file I/O,
with respect to large files:
1. Some places *have* to deal with large files (eg navigating in a
potentially large tar file), and there we should already be using
pgoff_t and the relevant system call wrappers should be using the
int64_t stuff Windows provides. These are primarily frontend code.
2. Some places use segmentation *specifically because* there are
systems with 32 bit off_t. These are mostly backend code dealing with
relation data files. The only system left with narrow off_t is
Windows.
In reality the stuff in category 1 has been developed through a
process of bug reports and patches (970b97e and 970b97e^ springs to
mind as the most recent case I had something to with, but see also
stat()-related stuff, and see aa5518304 where we addressed the one
spot in buffile.c that had to consider multiple segments). But the
fact that Windows can't use segments > 2GB because the fd.c and
smgr.c/md.c layers work with off_t is certainly a well known
limitation, ie specifically that relation and temporary/buf files are
special in this way. I'm mostly baffled by the fact that --relsegsize
actually *lets* you set it higher than 2 on that platform. Perhaps we
should at least backpatch a configure check or static assertion to
block that? It's not good if it compiles but doesn't actually work.
For master I think it makes sense to clean this up, as you say,
because the fuzzy boundary between the two categories of file I/O is
bound to cause more problems, it's just unfinished business that has
been tackled piecemeal as required by bug reports... In fact, on a
thread[1] where I explored making the segment size a runtime option
specified at initdb time, I even posted patches much like yours in the
first version, spreading pgoff_t into more places, and then in a later
version it was suggested that it might be better to just block
settings that are too big for your off_t, so I did that. I probably
thought that we already did that somewhere for the current
compile-time constant...
> Not urgent since few people hit this in practice, but it's clearly wrong
> code.
Yeah. In my experience dealing with bug reports, the Windows users
community skews very heavily towards just consuming EDB's read-built
installer. We rarely hear about configuration-level problems, so I
suppose it's not surprising that no one has ever complained that it
lets you configure it in a way that we hackers all know is certainly
going to break.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andreas Karlsson | 2025-11-06 09:37:40 | Re: ago(interval) → timestamptz |
| Previous Message | Álvaro Herrera | 2025-11-06 09:06:13 | Re: Consistently use the XLogRecPtrIsInvalid() macro |