Re: Make NUM_XLOGINSERT_LOCKS configurable

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, 1111hqshj(at)sina(dot)com, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Make NUM_XLOGINSERT_LOCKS configurable
Date: 2024-01-15 10:54:07
Message-ID: CAKZiRmzVi9Z+bSpqqj44ySd1U1RXKQvM2N5kJ6_n0+09CdaviQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 12, 2024 at 7:33 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Jan 10, 2024 at 11:43 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> writes:
> > > On Wed, Jan 10, 2024 at 10:00 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >> Maybe. I bet just bumping up the constant by 2X or 4X or so would get
> > >> most of the win for far less work; it's not like adding a few more
> > >> LWLocks is expensive. But we need some evidence about what to set it to.
> >
> > > I previously made an attempt to improve WAL insertion performance with
> > > varying NUM_XLOGINSERT_LOCKS. IIRC, we will lose what we get by
> > > increasing insertion locks (reduction in WAL insertion lock
> > > acquisition time) to the CPU overhead of flushing the WAL in
> > > WaitXLogInsertionsToFinish as referred to by the following comment.
> >
> > Very interesting --- this is at variance with what the OP said, so
> > we definitely need details about the test conditions in both cases.
> >
> > > Unfortunately, I've lost the test results, I'll run them up again and
> > > come back.
> >
> > Please.
>
> Okay, I'm back with some testing

[..]

> Results with varying NUM_XLOGINSERT_LOCKS (note that we can't allow it
> be more than MAX_SIMUL_LWLOCKS):
> Locks TPS WAL Insert Lock Acquire Time in Milliseconds WAL
> Wait for In-progress Inserts to Finish Time in Milliseconds
> 8 18669 12532 8775
> 16 18076 10641 13491
> 32 18034 6635 13997
> 64 17582 3937 14718
> 128 17782 4563 20145
>
> Also, check the attached graph. Clearly there's an increase in the
> time spent in waiting for in-progress insertions to finish in
> WaitXLogInsertionsToFinish from 8.7 seconds to 20 seconds. Whereas,
> the time spent to acquire WAL insertion locks decreased from 12.5
> seconds to 4.5 seconds. Overall, this hasn't resulted any improvement
> in TPS, in fact observed slight reduction.

Hi, I've hastily tested using Bharath's patches too as I was thinking
it would be a fast win due to contention, however it seems that (at
least on fast NVMEs?) increasing NUM_XLOGINSERT_LOCKS doesn't seem to
help.

With pgbench -P 5 -c 32 -j 32 -T 30 and
- 64vCPU Lsv2 (AMD EPYC), on single NVMe device (with ext4) that can
do 100k RW IOPS(at)8kB (with fio/libaio, 4jobs)
- shared_buffers = '8GB', max_wal_size = '32GB', track_wal_io_timing = on
- maxed out wal_buffers = '256MB'

tpcb-like with synchronous_commit=off
TPS wal_insert_lock_acquire_time wal_wait_for_insert_to_finish_time
8 30393 24087 128
32 31205 968 93

tpcb-like with synchronous_commit=on
TPS wal_insert_lock_acquire_time wal_wait_for_insert_to_finish_time
8 12031 8472 10722
32 11957 1188 12563

tpcb-like with synchronous_commit=on and pgbench -c 64 -j 64
TPS wal_insert_lock_acquire_time wal_wait_for_insert_to_finish_time
8 25010 90620 68318
32 25976 18569 85319
// same, Bharath said , it shifted from insert_lock to
waiting_for_insert to finish

insertonly (largeinserts) with synchronous_commit=off (still -c 32 -j 32)
TPS wal_insert_lock_acquire_time wal_wait_for_insert_to_finish_time
8 367 19142 83
32 393 875 68

insertonly (largeinserts) with synchronous_commit=on (still -c 32 -j 32)
TPS wal_insert_lock_acquire_time wal_wait_for_insert_to_finish_time
8 329 15950 125
32 310 2177 316

insertonly was := {
create sequence s1;
create table t (id bigint, t text) partition by hash (id);
create table t_h0 partition of t FOR VALUES WITH (modulus 8, remainder 0);
create table t_h1 partition of t FOR VALUES WITH (modulus 8, remainder 1);
create table t_h2 partition of t FOR VALUES WITH (modulus 8, remainder 2);
create table t_h3 partition of t FOR VALUES WITH (modulus 8, remainder 3);
create table t_h4 partition of t FOR VALUES WITH (modulus 8, remainder 4);
create table t_h5 partition of t FOR VALUES WITH (modulus 8, remainder 5);
create table t_h6 partition of t FOR VALUES WITH (modulus 8, remainder 6);
create table t_h7 partition of t FOR VALUES WITH (modulus 8, remainder 7);

and runtime pgb:
insert into t select nextval('s1'), repeat('A', 1000) from
generate_series(1, 1000);
}

it was truncated every time, DB was checkpointed, of course it was on master.
Without more details from Qingsong it is going to be hard to explain
the boost he witnessed.

-J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2024-01-15 11:35:27 Re: Postgres and --config-file option
Previous Message Amit Kapila 2024-01-15 10:39:26 Re: Synchronizing slots from primary to standby