Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Date: 2022-09-26 15:03:53
Message-ID: CALj2ACW5Cy2uC_Hu1V-aq5uJx+4i=UDabQr_WvQwOPsPQULEEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 24, 2022 at 9:44 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> Although WriteFile() with a synchronous file handle and an explicit
> offset doesn't use the current file position, it appears that it still
> changes it. :-(
>
> You'd think from the documentation[1] that that isn't the case, because it says:
>
> "Considerations for working with synchronous file handles:
>
> * If lpOverlapped is NULL, the write operation starts at the current
> file position and WriteFile does not return until the operation is
> complete, and the system updates the file pointer before WriteFile
> returns.
>
> * If lpOverlapped is not NULL, the write operation starts at the
> offset that is specified in the OVERLAPPED structure and WriteFile
> does not return until the write operation is complete. The system
> updates the OVERLAPPED Internal and InternalHigh fields before
> WriteFile returns."
>
> So it's explicitly saying the file pointer is updated in the first
> bullet point and not the second, but src/port/win32pwrite.c is doing
> the second thing.

The WriteFile() and pwrite() implementation in win32pwrite.c are
changing the file pointer on Windows, see the following and [4] for
more details.

# Running: pg_basebackup --no-sync -cfast -D
C:\cirrus\build\testrun\pg_basebackup\010_pg_basebackup\data\tmp_test_sV4r/backup2
--no-manifest --waldir
C:\cirrus\build\testrun\pg_basebackup\010_pg_basebackup\data\tmp_test_sV4r/xlog2
pg_basebackup: offset_before 0 and offset_after 16777216 aren't the same
Assertion failed: offset_before == offset_after, file
../src/bin/pg_basebackup/walmethods.c, line 254

Irrespective of what Windows does with file pointers in WriteFile(),
should we add lseek(SEEK_SET) in our own pwrite()'s implementation,
something like [5]? This is rather hackish without fully knowing what
Windows does internally in WriteFile(), but this does fix inherent
issues that our pwrite() callers (there are quite a number of places
that use pwrite() and presumes file pointer doesn't change on Windows)
may have on Windows. See the regression tests passing [6] with the fix
[5].

> Yet the following assertion added to Bharath's code
> fails[2]:

That was a very quick patch though, here's another version adding
offset checks and an assertion [3], see the assertion failures here
[4].

I also think that we need to discuss this issue separately.

Thoughts?

> [1] https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile#synchronization-and-file-position
> [2] https://api.cirrus-ci.com/v1/artifact/task/6201051266154496/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup
[3] - https://github.com/BRupireddy/postgres/tree/add_pwrite_and_offset_checks_in_walmethods_v2
[4] - https://api.cirrus-ci.com/v1/artifact/task/5294264635621376/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup
[5]
diff --git a/src/port/win32pwrite.c b/src/port/win32pwrite.c
index 7f2e62e8a7..542b548279 100644
--- a/src/port/win32pwrite.c
+++ b/src/port/win32pwrite.c
@@ -37,5 +37,16 @@ pwrite(int fd, const void *buf, size_t size, off_t offset)
return -1;
}

+ /*
+ * On Windows, it is found that WriteFile() changes the file
pointer and we
+ * want pwrite() to not change. Hence, we explicitly reset the
file pointer
+ * to beginning of the file.
+ */
+ if (lseek(fd, 0, SEEK_SET) != 0)
+ {
+ _dosmaperr(GetLastError());
+ return -1;
+ }
+
return result;
}
[6] - https://github.com/BRupireddy/postgres/tree/add_pwrite_and_offset_checks_in_walmethods_v3

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2022-09-26 15:06:51 PostgreSQL 15 GA release date
Previous Message Peter Eisentraut 2022-09-26 14:55:22 Re: Convert *GetDatum() and DatumGet*() macros to inline functions