Re: More speedups for tuple deformation

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: More speedups for tuple deformation
Date: 2026-03-01 13:10:56
Message-ID: CAApHDvq21qQigiM6z2YgadFusQC_pfEYP8D=oQCrwJ_kKzcqDg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 26 Feb 2026 at 09:29, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Huh. It, at least partially, seems to be related to using an integer for
> attnum et al. Due to us using -fwrapv, the compiler can't actually assume that
> an attnum++ won't overflow. An overflow would make the loop trip counts a lot
> more complicated. Even with that I don't understand how it ends up
> generating such crappy code, but since using size_t fixes it...

Thanks. That seems to make the gcc compiled version quite a bit better.

I am still seeing a bit of register overflow as the TupleDesc is
written to the stack and reloaded back into a register a couple of
times. I've attached the objdump in question.

if (attnum < firstNonGuaranteedAttr)
1c3c: 48 39 e8 cmp rax,rbp
1c3f: 73 7f jae 1cc0 <tts_heap_getsomeattrs+0x110>
1c41: 48 89 54 24 f0 mov QWORD PTR [rsp-0x10],rdx
1c46: 48 8d 74 c2 20 lea rsi,[rdx+rax*8+0x20]

the tupledesc is put back into the register in:

off += cattr->attlen;
1f88: 48 8b 54 24 f0 mov rdx,QWORD PTR [rsp-0x10]

I've not found a way to have gcc not do this.

I've also resequenced the patches so 0002 contains the sibling call
optimisation for slot_getmissingattrs() and I've applied that tail
call optimisation that you mentioned for slot_getmissingattrs() in
0004.

I've attached benchmark results in the attached spreadsheet.

David

Attachment Content-Type Size
tts_heap_getsomeattrs_objdump_Mintel.txt text/plain 63.8 KB
v11-0001-Introduce-deform_bench-test-module.patch text/plain 7.3 KB
v11-0002-Allow-sibling-call-optimization-in-slot_getsomea.patch text/plain 7.3 KB
v11-0005-Reduce-size-of-CompactAttribute-struct-to-8-byte.patch text/plain 5.5 KB
v11-0003-Add-empty-TupleDescFinalize-function.patch text/plain 29.0 KB
v11-0004-Optimize-tuple-deformation.patch text/plain 66.9 KB
Deform_bench_test_module_results_v11.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 36.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2026-03-01 13:25:20 Re: Partial Mode in Aggregate Functions
Previous Message Alexander Lakhin 2026-03-01 13:00:00 Re: Improving tracking/processing of buildfarm test failures