| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Victor Yegorov <vyegorov(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Moving _bt_readpage and _bt_checkkeys into a new .c file |
| Date: | 2025-12-06 20:04:20 |
| Message-ID: | CAH2-Wz=nntKkr2wAK92zyiaSMzfQa81TqA0HtezwxXoRKYB6+A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Dec 6, 2025 at 3:07 AM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
> I like this change and I agree that it's both handy and gives an easy performance boost.
There's a number of things that I find counterintuitive about the
performance impact of this patch:
* It doesn't make very much difference (under a 1% improvement) on
similar index-only scans, with queries such as "SELECT count(*) FROM
pgbench_accounts WHERE aid between :aid AND :endrange". One might
expect this case to be improved at least as much as the plain index
scan case, since the relative cost of _bt_readpage is higher (since
it's much cheaper to access the visibility map than to access the
heap).
* The patch makes an even larger difference when I make pgbench use a
larger range of values in the "between". For example, if I increase
the number of values/tuples returned from 2,000 (which is what I used
initially/reported on in the first email) to 15,000, I find that the
patch increases TPS by as much as 5.5%.
* These queries make maximum use of the _bt_set_startikey optimization
-- most individual leaf pages don't need to evaluate any scan key
(after an initial page-level check within _bt_set_startikey). So the
patch really helps in exactly those cases where we don't truly need to
access the scan direction at all -- the loop inside _bt_check_compare
always has 0 iterations with these queries, which means that scan
direction doesn't actually ever need to be considered at that point.
My best guess is that the benefits I see come from eliminating a
dependent load. Without the second patch applied, I see this
disassembly for _bt_checkkeys:
mov rax,QWORD PTR [rdi+0x38] ; Load scan->opaque
mov r15d,DWORD PTR [rax+0x70] ; Load so->dir
A version with the second patch applied still loads a pointer passed
by the _bt_checkkeys caller (_bt_readpage), but doesn't have to chase
another pointer to get to it. Maybe this significantly ameliorates
execution port pressure in the cases where I see a speedup?
> Patch applies and compiles cleanly. I can barely see a performance boost on my end (VM on a busy host), round 1%, but I still consider this change beneficial.
It seems to have no downsides, and some upside. I wouldn't be
surprised if the results I'm seeing are dependent on
microarchitectural details. I myself use a Zen 3 chip (a Ryzen 9
5950X).
--
Peter Geoghegan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2025-12-06 20:40:25 | Re: Something in our JIT code is screwing up PG_PRINTF_ATTRIBUTE |
| Previous Message | Kirill Reshke | 2025-12-06 19:42:14 | Re: [PATCH] Fix typo in psql \copy command documentation |