| From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: More speedups for tuple deformation |
| Date: | 2026-03-01 13:10:56 |
| Message-ID: | CAApHDvq21qQigiM6z2YgadFusQC_pfEYP8D=oQCrwJ_kKzcqDg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, 26 Feb 2026 at 09:29, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Huh. It, at least partially, seems to be related to using an integer for
> attnum et al. Due to us using -fwrapv, the compiler can't actually assume that
> an attnum++ won't overflow. An overflow would make the loop trip counts a lot
> more complicated. Even with that I don't understand how it ends up
> generating such crappy code, but since using size_t fixes it...
Thanks. That seems to make the gcc compiled version quite a bit better.
I am still seeing a bit of register overflow as the TupleDesc is
written to the stack and reloaded back into a register a couple of
times. I've attached the objdump in question.
if (attnum < firstNonGuaranteedAttr)
1c3c: 48 39 e8 cmp rax,rbp
1c3f: 73 7f jae 1cc0 <tts_heap_getsomeattrs+0x110>
1c41: 48 89 54 24 f0 mov QWORD PTR [rsp-0x10],rdx
1c46: 48 8d 74 c2 20 lea rsi,[rdx+rax*8+0x20]
the tupledesc is put back into the register in:
off += cattr->attlen;
1f88: 48 8b 54 24 f0 mov rdx,QWORD PTR [rsp-0x10]
I've not found a way to have gcc not do this.
I've also resequenced the patches so 0002 contains the sibling call
optimisation for slot_getmissingattrs() and I've applied that tail
call optimisation that you mentioned for slot_getmissingattrs() in
0004.
I've attached benchmark results in the attached spreadsheet.
David
| Attachment | Content-Type | Size |
|---|---|---|
| tts_heap_getsomeattrs_objdump_Mintel.txt | text/plain | 63.8 KB |
| v11-0001-Introduce-deform_bench-test-module.patch | text/plain | 7.3 KB |
| v11-0002-Allow-sibling-call-optimization-in-slot_getsomea.patch | text/plain | 7.3 KB |
| v11-0005-Reduce-size-of-CompactAttribute-struct-to-8-byte.patch | text/plain | 5.5 KB |
| v11-0003-Add-empty-TupleDescFinalize-function.patch | text/plain | 29.0 KB |
| v11-0004-Optimize-tuple-deformation.patch | text/plain | 66.9 KB |
| Deform_bench_test_module_results_v11.xlsx | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 36.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Marcos Pegoraro | 2026-03-01 13:25:20 | Re: Partial Mode in Aggregate Functions |
| Previous Message | Alexander Lakhin | 2026-03-01 13:00:00 | Re: Improving tracking/processing of buildfarm test failures |