| From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: More speedups for tuple deformation |
| Date: | 2026-01-18 22:13:16 |
| Message-ID: | CAApHDvoh3Q413szd-zsUTCpQPWNdpUYvx-fvsB8DP8zOja+ckg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 2 Jan 2026 at 18:58, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Please find attached an updated set of patches. A rebase was needed,
> plus 0003 had a problem with an Assert not handling the bitmap being a
> NULL pointer.
Another rebase and updates to some newly created missing calls to
TupleDescFinalize().
I've also attached another round of benchmarks after dipping into some
Azure machines to cover my lack of any Intel benchmark results. I
think these are somewhat noisy as I opted for low core-count instances
which will have L3 shared with workloads running for other people.
This is most evident in Xeon_E5-2673 with gcc where the patched run
was nearly twice as fast as unpatched for test 2 on 20 extra columns.
If you look at the raw results from that, you can see the times are
quite unstable between the 3 runs of each test, which makes me believe
that the machine was busy with other work when that test ran on
master. The AMD3990x and M2 machines are all sitting next to me and
were otherwise idle, so they should be much more stable.
Quite a few machines have a small regression for the 0 extra column
tests. There is a small amount of extra work being done in the
deforming function to check if the attnum < the first attribute
without an attcacheoff. This mostly only affects the tests that don't
do any deforming with a cached attcacheoff, e.g due to NULLs or
varlena types. The only way I've thought about to possibly reduce that
is to invent a new TupleTableSlotOps and pick the one that applies
when creating the TupleTableSlot. This doesn't appeal to me very much
as it requires modifying many callsites. But I do wonder if we should
try to come up with something here as technically we could use this to
eliminate alignment padding out of some MinimalTuples in some cases
where these were not directly derived from pre-formed HeapTuples. That
could allow a more compact tuple representation for sorting and
hashing, allowing us to do more with less memory in some cases.
The benchmark results also indicated that there wasn't much advantage
to the 0002+0003 patches, so I've removed those from the set. That
reduces some complexity around the benchmarks. I did still keep the
OPTIMIZE_BYVAL loop as separate results. It's not quite clear what's
best there as machines seem to vary on which they prefer.
Benchmark results attached in the bz2 file both in spreadsheet form
and the raw results pg_dumped.
David
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Precalculate-CompactAttribute-s-attcacheoff.patch | text/plain | 73.2 KB |
| deform_results2.tar.bz2 | application/x-compressed | 416.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Geoghegan | 2026-01-18 23:51:58 | Re: index prefetching |
| Previous Message | Mihail Nikalayeu | 2026-01-18 21:52:00 | Re: Adding REPACK [concurrently] |