| From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: More speedups for tuple deformation |
| Date: | 2026-02-24 02:23:17 |
| Message-ID: | CAApHDvodSVBj3ypOYbYUCJX+NWL=VZs63RNBQ_FxB_F+6QXF-A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
I've attached an updated version of the patch (v9). This includes some
changes to the main patch so that we no longer access the tuple's
natts when we're only fetching <= maximum guaranteed column. I've also
included what John mentioned about using pg_rightmost_one_pos32()
directly rather than trying to reinvent it in a slightly more optimal
way.
The changes in 0004 and 0005 are new. 0004 makes calling
slot_getmissingattrs() the responsibility of the
TupleTableSlotOps.getsomeattrs() function. Doing this allows
getsomeattrs() to be called with the sibling call optimisation in
slot_getsomeattrs_int() and since slot_getsomeattrs_int() is such a
trivial function now, I ended up just modifying slot_getsomeattrs() to
call getsomeattrs() in a way that allows the compiler to apply the
sibling call optimisation. This seems to help reduce some overheads
and makes the 0 extra column tests look better.
I've also modified slot_getmissingattrs() replacing the memsets with a
for loop to zero the tts_values and set the nulls in the tts_isnull
array. Other experimentations showed that doing this in a loop is
faster than memset, so I applied those learnings there too. I've moved
the elog(ERROR) that checks for invalid attnums into there too. That
does mean we'll deform a tuple before raising that error, but I don't
see the issue with that given that it's a "can't happen" error anyway.
Moving it there saves a compare and jump.
0005 reduces the size of CompactAttribute. It shrinks the struct down
to 8 bytes from 16 by using some bitflags for some lesser-used
booleans and by shrinking attcacheoff down to int16. The idea is that
we just don't cache any offsets larger than 2^15. It's likely if we
get a tuple that big that there's a variable-length attribute anyway,
which caching the offset of isn't possible.
I'm not getting great results from benchmarking the 0005 patch. I
verified that gcc does access the array without calculating the
element address from scratch each time and calculates it once, then
increments the pointer by sizeof(CompactAttribute). See the attached
.csv for the results on the 3 machines I tested on.
I've also resequenced the patches to make the deform_bench test module
part of the 0001 patch. This makes it easier to test the performance
of master.
I've not yet made it so the TTS_FLAG_OBEYS_NOT_NULL_CONSTRAINTS
tts_flag gets set in all places it currently could be set. There are a
few more scan types where it could be set. I understand you mentioned
that you thought the flag should disable the optimisation rather than
enable it, but I've not yet looked to check all the places that it
needs to be disabled.
David
| Attachment | Content-Type | Size |
|---|---|---|
| v9-0001-Introduce-deform_bench-test-module.patch | text/plain | 7.3 KB |
| v9-0002-Add-empty-TupleDescFinalize-function.patch | text/plain | 29.0 KB |
| v9-0003-Optimize-tuple-deformation.patch | text/plain | 59.9 KB |
| v9-0004-Allow-sibling-call-optimization-in-slot_getsomeat.patch | text/plain | 8.5 KB |
| v9-0005-Reduce-size-of-CompactAttribute-struct-to-8-bytes.patch | text/plain | 5.3 KB |
| deform_results_v9.csv | text/csv | 25.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2026-02-24 02:56:25 | Re: Row pattern recognition |
| Previous Message | Tom Lane | 2026-02-24 02:08:56 | Re: pgsql: libpq: Grease the protocol by default |