From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: strange perf regression with data checksums |
Date: | 2025-05-22 12:56:33 |
Message-ID: | bd8d04ec-11f9-443c-b431-c3f65ab04b96@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I finally had time to do more rigorous testing on the v1/v2 patches.
Attached is a .tgz with test script that initializes a pgbench scale 1,
and then:
* Modifies the data to have different patterns / number of matching
rows, etc. This is dobe by scripts in init/ directory.
* Runs queries that either match or do not match any rows. This is
done by scripts in select/ directory.
* 32, 64 and 96 clients (the system has ~96 cores)
The scripts also force a particular scan type (bitmap/index/index-only),
and may also pin the processes to CPUs in different ways:
* default = no pinning, it's up to scheduler
* colocated = pgbench/backend always on the same core
* random = pgbench/backend always on a different random core
This is done by a custom pgbench patch (can share, if needed). I found
the pinning may have *massive* impact in some cases.
There's also CSV with raw results, and two PDF files with a summary of
the results:
* results-relative-speedup-vs-master.pdf - Shows throughput relative
to master (for the same client count), 100% means no difference.
* results-relative-speedup-vs-32.pdf - Slightly different view on the
data, showing "scalability" for a given build. It compares
throughput to "expected" multiple of the result we got for 32
clients. 100% means linear scalability.
As usual, green=good, red=bad. My observation is that v2 performs better
than v1 (more green, darker green). v2 helps even in cases where v1 did
not make any difference (e.g. some of the "nomatch" cases).
It's also interesting how much impact the pinnig has - the "colocated"
results are much better. It's also interesting that in a couple cases we
scale superlinearly, i.e. 96 has better throughput than 3x that of 32
clients.
I've seen this before, and I believe it's due to behavior of the
hardware, and some kernel optimizations. Perhaps there's something we
could learn from this, not sure.
Anyway, as a comparison of v1 and v2 I think this is enough.
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
results-relative-speedup-vs-32.pdf | application/pdf | 64.5 KB |
results-relative-speedup-vs-master.pdf | application/pdf | 62.8 KB |
test-scripts.tgz | application/x-compressed-tar | 33.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2025-05-22 13:04:45 | Re: generic plans and "initial" pruning |
Previous Message | Amit Kapila | 2025-05-22 12:21:23 | Re: Make wal_receiver_timeout configurable per subscription |