Re: More speedups for tuple deformation

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: More speedups for tuple deformation
Date: 2026-01-23 16:33:26
Message-ID: rvlc7pb6zn4kydqovcqh72lf2qfcgs3qkj2seq7tcpvxyqwtqt@nrvv6lpehwwa
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-01-22 20:18:21 -0500, Andres Freund wrote:
> I haven't yet looked at the new version of the patch, but I ran your benchmark
> from upthread (fwiw, I removed the sleep 10 to reduce runtimes, the results
> seem stable enough anyway) on two intel machines, as you mentioned that you
> saw a lot variation in Azure.
>
> For both I disabled turbo boost, cpu idling and pinned the backend to a single
> CPU core.
>
> There's a bit of noise on "awork3" (basically an editor and an idle browser
> window), but everything is pinned to the other socket. "awork4" is entirely
> idle.
>
>
> Looks like overall the results are quite impressive! Some of the extra_cols=0
> runs saphire rapids are a bit slower, but the losses are much smaller than the
> gains in other cases.
>
>
> I think it'd be good to add a few test cases of "incremental deforming" to the
> benchmark. E.g. a qual that accesses column 10, but projection then deforms up
> to 20. I'm a bit worried that e.g. the repeated first_null_attr()
> computations could cause regressions.

The overhead of the aggregation etc makes it harder to see efficiency changes
in deformation speed:

I think it'd be worth replacing the SUM(a) with WHERE a < 0 (filtering all
rows), to reduce the cost of the executor dispatch.

Here's a profile of the SUM(a):

- 99.90% 0.00% postgres postgres [.] standard_ExecutorRun
- standard_ExecutorRun
- 96.83% ExecAgg
- 49.86% ExecInterpExpr
- 28.30% slot_getsomeattrs_int
tts_buffer_heap_getsomeattrs
0.67% tts_buffer_heap_getsomeattrs
+ 0.02% asm_sysvec_apic_timer_interrupt
- 37.44% fetch_input_tuple
- 31.42% ExecSeqScan
+ 20.58% heap_getnextslot
3.58% MemoryContextReset
0.52% heapgettup_pagemode
0.32% ExecStoreBufferHeapTuple
0.99% heap_getnextslot
0.79% MemoryContextReset
2.81% int4_sum
1.39% MemoryContextReset

Which takes ~93ms on average for the first generated bench.sql

- 99.88% 0.00% postgres postgres [.] standard_ExecutorRun
- standard_ExecutorRun
- 95.78% ExecSeqScanWithQual
- 57.65% ExecInterpExpr
- 29.08% slot_getsomeattrs_int
tts_buffer_heap_getsomeattrs
0.49% tts_buffer_heap_getsomeattrs
- 25.40% heap_getnextslot
+ 15.00% heapgettup_pagemode
+ 4.71% ExecStoreBufferHeapTuple
0.05% UnlockBuffer
1.80% MemoryContextReset
0.77% int4lt
0.52% heapgettup_pagemode
0.47% ExecStoreBufferHeapTuple
0.37% slot_getsomeattrs_int
2.11% heap_getnextslot
1.49% ExecInterpExpr
0.50% MemoryContextReset

Same data, but with a WHERE a < 0, takes on average ~74m.

I wonder if it's worth writing a C helper to test deformation in a bit more
targeted way.

Looking at the profile of ExecSeqScanWithQual() made me a bit sad, turns out
that some of the generated code isn't great :(. I'll start a separate thread
about that.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Filip Janus 2026-01-23 16:40:32 Re: Proposal: Adding compression of temporary files
Previous Message Tom Lane 2026-01-23 16:30:07 Re: Time to drop RADIUS support?