From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? |
Date: | 2022-08-07 15:52:39 |
Message-ID: | CALj2ACVu6Tqda-8m17yxH2dTLz1XC1fSGvFEXvekaKbOhn7Epw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Aug 7, 2022 at 3:19 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> > A second thing is that pg_pwritev_with_retry_and_write_zeros() is
> > designed to work on WAL segments initialization and it uses
> > XLOG_BLCKSZ and PGAlignedXLogBlock for the job, but there is nothing
> > in its name that tells us so. This makes me question whether
> > file_utils.c is a good location for this second thing. Could a new
> > file be a better location? We have a xlogutils.c in the backend, and
> > a name similar to that in src/common/ would be one possibility.
>
> Yeah, I think it should probably be disconnected from XLOG_BLCKSZ, or
> maybe it's OK to use BLCKSZ with a comment to say that it's a bit
> arbitrary, or maybe it's better to define a new zero buffer of some
> arbitrary size just in this code if that is too strange. We could
> experiment with different size buffers to see how it performs, bearing
> in mind that every time we double it you halve the number of system
> calls, but also bearing in mind that at some point it's too much for
> the stack. I can tell you that the way that code works today was not
> really written with performance in mind (unlike, say, the code
> reverted from 9.4 that tried to do this with posix_fallocate()), it
> was just finding an excuse to call pwritev(), to exercise new fallback
> code being committed for use by later AIO stuff (more patches coming
> soon). The retry support was added because it seemed plausible that
> some system out there would start to do short writes as we cranked up
> the sizes for some implementation reason other than ENOSPC, so we
> should make a reusable retry routine.
Yes, doubling the zerobuffer size to say 2 * XLOG_BLCKSZ or 2 * BLCKSZ
reduces the system calls to half (right now, pg_pwritev_with_retry()
gets called 64 times per 16MB WAL file, it writes in the batches of 32
blocks per call).
Is there a ready-to-use tool or script or specific settings for
pgbench (pgbench command line options or GUC settings) that I can play
with to measure the performance?
> I think this should also handle the remainder after processing whole
> blocks, just for completeness. If I call the code as presented with size
> 8193, I think this code will only write 8192 bytes.
Hm, I will fix it.
> I think if this ever needs to work on O_DIRECT files there would be an
> alignment constraint on the buffer and size, but I don't think we have
> to worry about that for now.
We can add a comment about the above limitation, if required.
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2022-08-07 16:41:16 | Re: Use fadvise in wal replay |
Previous Message | Marcos Pegoraro | 2022-08-07 14:56:44 | Re: bug on log generation ? |