| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: More speedups for tuple deformation |
| Date: | 2026-02-24 14:39:16 |
| Message-ID: | rbxc2qqhsvzxpukgd36caoa4ydgn5r22fxktxanrkn6nobg7j6@27b4vogohgu2 |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-02-24 15:23:17 +1300, David Rowley wrote:
> The changes in 0004 and 0005 are new. 0004 makes calling
> slot_getmissingattrs() the responsibility of the
> TupleTableSlotOps.getsomeattrs() function. Doing this allows
> getsomeattrs() to be called with the sibling call optimisation in
> slot_getsomeattrs_int() and since slot_getsomeattrs_int() is such a
> trivial function now, I ended up just modifying slot_getsomeattrs() to
> call getsomeattrs() in a way that allows the compiler to apply the
> sibling call optimisation. This seems to help reduce some overheads
> and makes the 0 extra column tests look better.
ISTM we should just merge 0004. In my testing it's a very clear win, without,
afaict, any downsides.
> 0005 reduces the size of CompactAttribute. It shrinks the struct down
> to 8 bytes from 16 by using some bitflags for some lesser-used
> booleans and by shrinking attcacheoff down to int16. The idea is that
> we just don't cache any offsets larger than 2^15. It's likely if we
> get a tuple that big that there's a variable-length attribute anyway,
> which caching the offset of isn't possible.
>
> I'm not getting great results from benchmarking the 0005 patch. I
> verified that gcc does access the array without calculating the
> element address from scratch each time and calculates it once, then
> increments the pointer by sizeof(CompactAttribute). See the attached
> .csv for the results on the 3 machines I tested on.
FWIW, where I had seen that be rather beneficial is the TupleDescCompactAttr()
at the start of the various loops, where the compiler has little choice to
compute the address of the tupdesc->compact_attrs[firstNeededCol]. That
matters only when only deforming a small number of columns, of course.
> I've also resequenced the patches to make the deform_bench test module
> part of the 0001 patch. This makes it easier to test the performance
> of master.
What are your thoughts about merging the deform_bench tooling? I wonder if we
should have src/test/modules/benchmark_tools or such, so we can add a few more
micro-benchmarky tools over time?
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Aleksander Alekseev | 2026-02-24 14:58:38 | Re: [PATCH] Refactor *_abbrev_convert() functions |
| Previous Message | Sami Imseih | 2026-02-24 14:19:55 | Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds |