Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Date: 2022-08-07 02:12:56
Message-ID: CA+hUKGJKwUrpP0igOFAv5khj3dbHvfWqLzFeo7vtNzDYXv_YZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 7, 2022 at 1:12 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> On Sat, Aug 6, 2022 at 12:11 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> Yeah. pg_pwritev_with_retry can also be part of common/file_utils.c/.h
> so that everyone can use it.
>
> > > Thoughts?
> >
> > Makes sense to me for the WAL segment pre-padding initialization, as
> > we still want to point to the beginning of the segment after we are
> > done with the pre-padding, and the code has an extra lseek().
>
> Thanks. I attached the v1 patch, please review it.

Hi Bharath,

+1 for moving pg_pwritev_with_retry() to file_utils.c, but I think the
commit message should say that is happening. Maybe the move should
even be in a separate patch (IMHO it's nice to separate mechanical
change patches from new logic/work patches).

+/*
+ * A convenience wrapper for pg_pwritev_with_retry() that zero-fills the given
+ * file of size total_sz in batches of size block_sz.
+ */
+ssize_t
+pg_pwritev_with_retry_and_init(int fd, int total_sz, int block_sz)

Hmm, why not give it a proper name that says it writes zeroes?

Size arguments around syscall-like facilities are usually size_t.

+ memset(zbuffer.data, 0, block_sz);

I believe the modern fashion as of a couple of weeks ago is to tell
the compiler to zero-initialise. Since it's a union you'd need
designated initialiser syntax, something like zbuffer = { .data = {0}
}?

+ iov[i].iov_base = zbuffer.data;
+ iov[i].iov_len = block_sz;

How can it be OK to use caller supplied block_sz, when
sizeof(zbuffer.data) is statically determined? What is the point of
this argument?

- dir_data->lasterrno = errno;
+ /* If errno isn't set, assume problem is no disk space */
+ dir_data->lasterrno = errno ? errno : ENOSPC;

This coding pattern is used in places where we blame short writes on
lack of disk space without bothering to retry. But the code used in
this patch already handles short writes: it always retries, until
eventually, if you really are out of disk space, you should get an
actual ENOSPC from the operating system. So I don't think the
guess-it-must-be-ENOSPC technique applies here.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-08-07 02:19:39 Re: Patch to address creation of PgStat* contexts with null parent context
Previous Message Bharath Rupireddy 2022-08-07 01:39:34 Re: Use fadvise in wal replay