Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Date: 2022-08-09 07:30:35
Message-ID: CALj2ACUccjR7KbKqWOsQmqH1ZGEDyJ7hH5Ef+DOhcv7+kOnjCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 8, 2022 at 6:10 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> I played with a simple insert use-case [1] that generates ~380 WAL
> files, with different block sizes. To my surprise, I have not seen any
> improvement with larger block sizes. I may be doing something wrong
> here, suggestions on to test and see the benefits are welcome.
>
> > > I think this should also handle the remainder after processing whole
> > > blocks, just for completeness. If I call the code as presented with size
> > > 8193, I think this code will only write 8192 bytes.
> >
> > Hm, I will fix it.
>
> Fixed.
>
> I'm attaching v5 patch-set. I've addressed review comments received so
> far and fixed a compiler warning that CF bot complained about.
>
> Please review it further.

I tried to vary the zero buffer size to see if there's any huge
benefit for the WAL-generating queries. Unfortunately, I didn't see
any benefit on my dev system (16 vcore, 512GB SSD, 32GB RAM) . The use
case I've tried is at [1] and the results are at [2].

Having said that, the use of pg_pwritev_with_retry() in walmethods.c
will definitely reduce number of system calls - on HEAD the
dir_open_for_write() makes pad_to_size/XLOG_BLCKSZ i.e. 16MB/8KB =
2,048 write() calls and with patch it makes only 64
pg_pwritev_with_retry() calls with XLOG_BLCKSZ zero buffer size. The
proposed patches will provide straight 32x reduction in system calls
(for pg_receivewal and pg_basebackup) apart from the safety against
partial writes.

[1]
/* built source code with release flags */
./configure --with-zlib --enable-depend --prefix=$PWD/inst/
--with-openssl --with-readline --with-perl --with-libxml CFLAGS='-O2'
> install.log && make -j 8 install > install.log 2>&1 &

\q
./pg_ctl -D data -l logfile stop
rm -rf data

/* ensured that nothing exists in OS page cache */
free -m
sudo su
sync; echo 3 > /proc/sys/vm/drop_caches
exit
free -m

./initdb -D data
./pg_ctl -D data -l logfile start
./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "64GB";'
./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";'
./psql -d postgres -c 'ALTER SYSTEM SET work_mem = "16MB";'
./psql -d postgres -c 'ALTER SYSTEM SET checkpoint_timeout = "1d";'
./pg_ctl -D data -l logfile restart
./psql -d postgres -c 'create table foo(bar int);'
./psql -d postgres
\timing
insert into foo select * from generate_series(1, 100000000); /* this
query generates about 385 WAL files, no checkpoint hence no recycle of
old WAL files, all new WAL files */

[2]
HEAD
Time: 84249.535 ms (01:24.250)

HEAD with wal_init_zero off
Time: 75086.300 ms (01:15.086)

#define PWRITEV_BLCKSZ XLOG_BLCKSZ
Time: 85254.302 ms (01:25.254)

#define PWRITEV_BLCKSZ (4 * XLOG_BLCKSZ)
Time: 83542.885 ms (01:23.543)

#define PWRITEV_BLCKSZ (16 * XLOG_BLCKSZ)
Time: 84035.770 ms (01:24.036)

#define PWRITEV_BLCKSZ (64 * XLOG_BLCKSZ)
Time: 84749.021 ms (01:24.749)

#define PWRITEV_BLCKSZ (256 * XLOG_BLCKSZ)
Time: 84273.466 ms (01:24.273)

#define PWRITEV_BLCKSZ (512 * XLOG_BLCKSZ)
Time: 84233.576 ms (01:24.234)

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2022-08-09 08:12:08 Re: Using each rel as both outer and inner for JOIN_ANTI
Previous Message Peter Smith 2022-08-09 07:14:54 Re: Data is copied twice when specifying both child and parent table in publication