Re: BUG #15858: could not stat file - over 4GB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, williamedwinallen(at)live(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15858: could not stat file - over 4GB
Date: 2019-06-19 17:40:10
Message-ID: 16138.1560966010@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo(dot)santamaria(at)gmail(dot)com> writes:
> On Wed, Jun 19, 2019 at 3:26 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> Windows is known for having limitations in its former implementations
>> of stat(), and the various _stat structures they use make actually
>> that much harder from a compatibility point of view:
>> https://www.postgresql.org/message-id/1803D792815FC24D871C00D17AE95905CF5099@g01jpexmbkw24

> Going through this discussion it is not clear to me if there was a
> consensus about the shape of an acceptable patch. Would something like
> the attached be suitable?

I think there's general agreement that the correct fix involves somehow
mapping stat() to _stat64() and mapping "struct stat" to "struct __stat64"
to go along with that. Beyond that, things get murky.

1. Can we assume that _stat64() and struct __stat64 exist on every Windows
version and build toolchain that we care about? Windows itself is
probably OK --- googling found a (non-authoritative) statement that these
were introduced in Windows 2K. But it's less clear whether they'll work
on builds with Cygwin, or Mingw, or Mingw-64, or how far back that support
goes. I found one statement that Mingw declares them only "#if
__MSVCRT_VERSION__ >= 0x0601".

2. Mapping stat() to _stat64() seems easy enough: we already declare
stat(a,b) as a macro on Windows, so just change it to something else.

3. What about the struct name? I proposed just "define stat __stat64",
but Robert thought that was too cute, and he's got a point --- in
particular, it's not clear to me how nicely it'd play to have both
function and object macros for the same name "stat". I see you are
proposing fixing this angle by suppressing the system definition of
struct stat and then defining it ourselves with the same contents as
struct __stat64. That might work. Ordinarily I'd be worried about
bit-rot in a struct that has to track a system definition, but Microsoft
are so religiously anal about never breaking ABI that it might be safe
to assume we don't have to worry about that.

I don't like the specific way you're proposing suppressing the system
definition of struct stat, though. "#define _CRT_NO_TIME_T" seems
like it's going to be a disaster, both because it likely has other
side-effects and because it probably doesn't do what you intend at all
on non-MSVC toolchains. We have precedents for dealing with similar
issues in, eg, plperl; and what those precedents would suggest is
doing something like

#define stat microsoft_native_stat
#include <sys/stat.h>
#undef stat

after which we could do

struct stat {
... same contents as __stat64
};

#define stat(a,b) _stat64(a,b)

Another issue here is that pgwin32_safestat() probably needs revisited
as to its scope and purpose. Its use of GetFileAttributesEx() can
presumably be dropped. I don't actually believe the header comment
claiming that stat() is not guaranteed to update the st_size field;
there's no indication of that in the Microsoft documentation. What
seems more likely is that that's a garbled version of the truth,
that you won't get a correct value of _st_size for files over 4GB.
But the test for ERROR_DELETE_PENDING might be worth keeping. So
that would lead us to

struct stat {
... same contents as __stat64
};

extern int pgwin32_safestat(const char *path, struct stat *buf);
#define stat(a,b) pgwin32_safestat(a,b)

and something like

int
pgwin32_safestat(const char *path, struct stat *buf)
{
int r;

/*
* Don't call stat(), that would just recurse back to here.
* We really want _stat64().
*/
r = _stat64(path, buf);

if (r < 0)
{
if (GetLastError() == ERROR_DELETE_PENDING)
{
/*
* File has been deleted, but is not gone from the filesystem yet.
* This can happen when some process with FILE_SHARE_DELETE has it
* open and it will be fully removed once that handle is closed.
* Meanwhile, we can't open it, so indicate that the file just
* doesn't exist.
*/
errno = ENOENT;
}
}
return r;
}

Not sure if we'd need an explicit cast to override passing struct
stat * to _stat64(). If so, a StaticAssert that sizeof(struct stat)
matches sizeof(struct __stat64) seems like a good idea.

I'd also be very strongly inclined to move pgwin32_safestat into its
own file in src/port and get rid of UNSAFE_STAT_OK. There wouldn't
be a good reason to opt out of using it once we got to this point.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-06-19 18:02:36 Re: BUG #15858: could not stat file - over 4GB
Previous Message Andrew Gierth 2019-06-19 17:06:26 Re: BUG #15855: Using 'nextval' inside INSERT RULE in case of bulk data insertion.

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2019-06-19 17:46:46 Re: idea: log_statement_sample_rate - bottom limit for sampling
Previous Message Dagfinn Ilmari Mannsåker 2019-06-19 17:04:27 Re: Remove one last occurrence of "replication slave" in comments