Quick Links

Re: strange perf regression with data checksums

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: strange perf regression with data checksums
Date:	2025-05-22 12:56:33
Message-ID:	bd8d04ec-11f9-443c-b431-c3f65ab04b96@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I finally had time to do more rigorous testing on the v1/v2 patches.
Attached is a .tgz with test script that initializes a pgbench scale 1,
and then:

* Modifies the data to have different patterns / number of matching
rows, etc. This is dobe by scripts in init/ directory.

* Runs queries that either match or do not match any rows. This is
done by scripts in select/ directory.

* 32, 64 and 96 clients (the system has ~96 cores)

The scripts also force a particular scan type (bitmap/index/index-only),
and may also pin the processes to CPUs in different ways:

* default = no pinning, it's up to scheduler
* colocated = pgbench/backend always on the same core
* random = pgbench/backend always on a different random core

This is done by a custom pgbench patch (can share, if needed). I found
the pinning may have *massive* impact in some cases.

There's also CSV with raw results, and two PDF files with a summary of
the results:

* results-relative-speedup-vs-master.pdf - Shows throughput relative
to master (for the same client count), 100% means no difference.

* results-relative-speedup-vs-32.pdf - Slightly different view on the
data, showing "scalability" for a given build. It compares
throughput to "expected" multiple of the result we got for 32
clients. 100% means linear scalability.

As usual, green=good, red=bad. My observation is that v2 performs better
than v1 (more green, darker green). v2 helps even in cases where v1 did
not make any difference (e.g. some of the "nomatch" cases).

It's also interesting how much impact the pinnig has - the "colocated"
results are much better. It's also interesting that in a couple cases we
scale superlinearly, i.e. 96 has better throughput than 3x that of 32
clients.

I've seen this before, and I believe it's due to behavior of the
hardware, and some kernel optimizations. Perhaps there's something we
could learn from this, not sure.

Anyway, as a comparison of v1 and v2 I think this is enough.

regards

--
Tomas Vondra

Attachment	Content-Type	Size
results-relative-speedup-vs-32.pdf	application/pdf	64.5 KB
results-relative-speedup-vs-master.pdf	application/pdf	62.8 KB
test-scripts.tgz	application/x-compressed-tar	33.7 KB

In response to

Re: strange perf regression with data checksums at 2025-05-20 12:46:25 from Peter Geoghegan

Responses

Re: strange perf regression with data checksums at 2025-06-04 11:33:06 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2025-05-22 13:04:45	Re: generic plans and "initial" pruning
Previous Message	Amit Kapila	2025-05-22 12:21:23	Re: Make wal_receiver_timeout configurable per subscription