Re: Optimising compactify_tuples()

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising compactify_tuples()
Date: 2020-09-06 11:37:40
Message-ID: CAApHDvoKwqAzhiuxEt8jSquPJKDpH8DNUZDFUSX9P7DXrJdc3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 20 Aug 2020 at 11:28, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> I fixed up the copyright messages, and removed some stray bits of
> build scripting relating to the Perl-generated file. Added to
> commitfest.

I'm starting to look at this. So far I've only just done a quick
performance test on it. With the workload I ran, using 0001+0002.

The test replayed ~2.2 GB of WAL. master took 148.581 seconds and
master+0001+0002 took 115.588 seconds. That's about 28% faster. Pretty
nice!

I found running a lower heap fillfactor will cause quite a few more
heap cleanups to occur. Perhaps that's one of the reasons the speedup
I got was more than the 12% you reported.

More details of the test:

Setup:

drowley(at)amd3990x:~$ cat recoverbench.sh
#!/bin/bash

pg_ctl stop -D pgdata -m smart
pg_ctl start -D pgdata -l pg.log -w
psql -c "drop table if exists t1;" postgres > /dev/null
psql -c "create table t1 (a int primary key, b int not null) with
(fillfactor = 85);" postgres > /dev/null
psql -c "insert into t1 select x,0 from generate_series(1,10000000)
x;" postgres > /dev/null
psql -c "drop table if exists log_wal;" postgres > /dev/null
psql -c "create table log_wal (lsn pg_lsn not null);" postgres > /dev/null
psql -c "insert into log_wal values(pg_current_wal_lsn());" postgres > /dev/null
pgbench -n -f update.sql -t 60000 -c 200 -j 200 -M prepared postgres > /dev/null
psql -c "select 'Used ' ||
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), lsn)) || ' of
WAL' from log_wal limit 1;" postgres
pg_ctl stop -D pgdata -m immediate -w
echo Starting Postgres...
pg_ctl start -D pgdata -l pg.log

drowley(at)amd3990x:~$ cat update.sql
\set i random(1,10000000)
update t1 set b = b+1 where a = :i;

Results:

master

Recovery times are indicated in the postgresql log:

2020-09-06 22:38:58.992 NZST [6487] LOG: redo starts at 3/16E4A988
2020-09-06 22:41:27.570 NZST [6487] LOG: invalid record length at
3/F67F8B48: wanted 24, got 0
2020-09-06 22:41:27.573 NZST [6487] LOG: redo done at 3/F67F8B20

recovery duration = 00:02:28.581

drowley(at)amd3990x:~$ ./recoverbench.sh
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
?column?
---------------------
Used 2333 MB of WAL
(1 row)

waiting for server to shut down.... done
server stopped
Starting Postgres...

recovery profile:
28.79% postgres postgres [.] pg_qsort
13.58% postgres postgres [.] itemoffcompare
12.27% postgres postgres [.] PageRepairFragmentation
8.26% postgres libc-2.31.so [.] 0x000000000018e48f
5.90% postgres postgres [.] swapfunc
4.86% postgres postgres [.] hash_search_with_hash_value
2.95% postgres postgres [.] XLogReadBufferExtended
1.83% postgres postgres [.] PinBuffer
1.80% postgres postgres [.] compactify_tuples
1.71% postgres postgres [.] med3
0.99% postgres postgres [.] hash_bytes
0.90% postgres libc-2.31.so [.] 0x000000000018e470
0.89% postgres postgres [.] StartupXLOG
0.84% postgres postgres [.] XLogReadRecord
0.72% postgres postgres [.] LWLockRelease
0.71% postgres postgres [.] PageGetHeapFreeSpace
0.61% postgres libc-2.31.so [.] 0x000000000018e499
0.50% postgres postgres [.] heap_xlog_update
0.50% postgres postgres [.] DecodeXLogRecord
0.50% postgres postgres [.] pg_comp_crc32c_sse42
0.45% postgres postgres [.] LWLockAttemptLock
0.40% postgres postgres [.] ReadBuffer_common
0.40% postgres [kernel.kallsyms] [k] copy_user_generic_string
0.36% postgres libc-2.31.so [.] 0x000000000018e49f
0.33% postgres postgres [.] SlruSelectLRUPage
0.32% postgres postgres [.] PageAddItemExtended
0.31% postgres postgres [.] ReadPageInternal

Patched v2-0001 + v2-0002:

Recovery times are indicated in the postgresql log:

2020-09-06 22:54:25.532 NZST [13252] LOG: redo starts at 3/F67F8C70
2020-09-06 22:56:21.120 NZST [13252] LOG: invalid record length at
4/D633FCD0: wanted 24, got 0
2020-09-06 22:56:21.120 NZST [13252] LOG: redo done at 4/D633FCA8

recovery duration = 00:01:55.588

drowley(at)amd3990x:~$ ./recoverbench.sh
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
?column?
---------------------
Used 2335 MB of WAL
(1 row)

waiting for server to shut down.... done
server stopped
Starting Postgres...

recovery profile:
32.29% postgres postgres [.] qsort_itemoff
17.73% postgres postgres [.] PageRepairFragmentation
10.98% postgres libc-2.31.so [.] 0x000000000018e48f
5.54% postgres postgres [.] hash_search_with_hash_value
3.60% postgres postgres [.] XLogReadBufferExtended
2.32% postgres postgres [.] compactify_tuples
2.14% postgres postgres [.] PinBuffer
1.39% postgres postgres [.] PageGetHeapFreeSpace
1.38% postgres postgres [.] hash_bytes
1.36% postgres postgres [.] qsort_itemoff_med3
0.94% postgres libc-2.31.so [.] 0x000000000018e499
0.89% postgres postgres [.] XLogReadRecord
0.74% postgres postgres [.] LWLockRelease
0.74% postgres postgres [.] DecodeXLogRecord
0.73% postgres postgres [.] heap_xlog_update
0.66% postgres postgres [.] LWLockAttemptLock
0.65% postgres libc-2.31.so [.] 0x000000000018e470
0.64% postgres postgres [.] pg_comp_crc32c_sse42
0.63% postgres postgres [.] StartupXLOG
0.61% postgres [kernel.kallsyms] [k] copy_user_generic_string
0.60% postgres postgres [.] PageAddItemExtended
0.60% postgres libc-2.31.so [.] 0x000000000018e49f
0.56% postgres libc-2.31.so [.] 0x000000000018e495
0.54% postgres postgres [.] ReadBuffer_common

Settings:
shared_buffers = 10GB
checkpoint_timeout = 1 hour
max_wal_size = 100GB

Hardware:

AMD 3990x
Samsung 970 EVO SSD
64GB DDR4 3600MHz

I'll spend some time looking at the code soon.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-09-06 13:26:08 Re: Yet another fast GiST build (typo)
Previous Message Konstantin Knizhnik 2020-09-06 11:05:35 Re: Improving connection scalability: GetSnapshotData()