| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: More speedups for tuple deformation |
| Date: | 2026-02-25 20:29:01 |
| Message-ID: | uhqul2ryci4tyg5ylddjrmf4kybzwb7m5z7rmurhhjp37vrn5f@zgxil7egr62n |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-02-25 13:05:14 -0500, Andres Freund wrote:
> At least gcc is doing some truly weird shit in the
> firstNonGuaranteed/firstNonCachedOffsetAttr loop "header" (i.e. just before
> the first entrance to the loop) , which leads to the register pressure being
> high, which leads to spilling on the stack, making the few-tuples case slower:
>
> [ lots of stuff trimmed ]
>
> I.e. the compiler creates an offset version of tts_values[tts_nvalid],
> tts_isnull[tts_nvalid], which then creates register allocation pressure,
> because later the original tts_values/tts_isnulll etc are accessed again and
> thus the underlying registers are preserved. And this is all for zero gain,
> from what I can tell, because the acceses are still done with indexed
> addressing (like mov %rdi,(%r12,%rcx,8)), which would work just as
> well if rcx were indexed based on attnum, not zero indexed within the loop.
>
> I see about a 10% improvement if I dissuade the compiler from doing that by
> adding
> __asm__ volatile ("" : "+r"(attnum) : :);
>
> In the loop body.
>
>
> I'm getting to the point where I'd like to just hand write the assembler for
> this stupid function. Gah.
Huh. It, at least partially, seems to be related to using an integer for
attnum et al. Due to us using -fwrapv, the compiler can't actually assume that
an attnum++ won't overflow. An overflow would make the loop trip counts a lot
more complicated. Even with that I don't understand how it ends up
generating such crappy code, but since using size_t fixes it...
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Joel Jacobson | 2026-02-25 20:31:57 | Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq |
| Previous Message | Antonin Houska | 2026-02-25 19:41:15 | Re: Adding REPACK [concurrently] |