Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Date: 2022-09-24 04:14:20
Message-ID: CA+hUKGL8RneKhL5u6giWbDcna1puD5_6OaUgxSjnq10pw4Es=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 24, 2022 at 8:24 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> +#ifdef WIN32
> + /*
> + * XXX: It looks like on Windows, we need an explicit lseek() call here
> + * despite using pwrite() implementation from win32pwrite.c. Otherwise
> + * an error occurs.
> + */
>
> I think this comment is too vague. Can we describe the error in more
> detail? Or better yet, can we fix it as a prerequisite to this patch set?

Although WriteFile() with a synchronous file handle and an explicit
offset doesn't use the current file position, it appears that it still
changes it. :-(

You'd think from the documentation[1] that that isn't the case, because it says:

"Considerations for working with synchronous file handles:

* If lpOverlapped is NULL, the write operation starts at the current
file position and WriteFile does not return until the operation is
complete, and the system updates the file pointer before WriteFile
returns.

* If lpOverlapped is not NULL, the write operation starts at the
offset that is specified in the OVERLAPPED structure and WriteFile
does not return until the write operation is complete. The system
updates the OVERLAPPED Internal and InternalHigh fields before
WriteFile returns."

So it's explicitly saying the file pointer is updated in the first
bullet point and not the second, but src/port/win32pwrite.c is doing
the second thing. Yet the following assertion added to Bharath's code
fails[2]:

+++ b/src/bin/pg_basebackup/walmethods.c
@@ -221,6 +221,10 @@ dir_open_for_write(WalWriteMethod *wwmethod,
const char *pathname,
if (pad_to_size && wwmethod->compression_algorithm ==
PG_COMPRESSION_NONE)
{
ssize_t rc;
+ off_t before_offset;
+ off_t after_offset;
+
+ before_offset = lseek(fd, 0, SEEK_CUR);

rc = pg_pwritev_zeros(fd, pad_to_size);

@@ -231,6 +235,9 @@ dir_open_for_write(WalWriteMethod *wwmethod, const
char *pathname,
return NULL;
}

+ after_offset = lseek(fd, 0, SEEK_CUR);
+ Assert(before_offset == after_offset);
+

[1] https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile#synchronization-and-file-position
[2] https://api.cirrus-ci.com/v1/artifact/task/6201051266154496/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-09-24 11:40:06 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Jacob Champion 2022-09-23 22:39:19 Re: [PoC] Federated Authn/z with OAUTHBEARER