| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com> |
| Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: More speedups for tuple deformation |
| Date: | 2026-01-23 05:29:22 |
| Message-ID: | 82AD055C-3280-4DFB-ADA8-A7A4DE3844A5@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Jan 23, 2026, at 09:18, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> I haven't yet looked at the new version of the patch, but I ran your benchmark
> from upthread (fwiw, I removed the sleep 10 to reduce runtimes, the results
> seem stable enough anyway) on two intel machines, as you mentioned that you
> saw a lot variation in Azure.
>
> For both I disabled turbo boost, cpu idling and pinned the backend to a single
> CPU core.
>
> There's a bit of noise on "awork3" (basically an editor and an idle browser
> window), but everything is pinned to the other socket. "awork4" is entirely
> idle.
>
>
> Looks like overall the results are quite impressive! Some of the extra_cols=0
> runs saphire rapids are a bit slower, but the losses are much smaller than the
> gains in other cases.
>
>
> I think it'd be good to add a few test cases of "incremental deforming" to the
> benchmark. E.g. a qual that accesses column 10, but projection then deforms up
> to 20. I'm a bit worried that e.g. the repeated first_null_attr()
> computations could cause regressions.
>
>
> Greetings,
>
> Andres Freund
> <deform_bench.csv>
Today I ran the benchmark on my MacBook M4 against 3 versions (all without assert and with -O2):
1) Master (f9a468c664a)
2) Master + v4
3) Master + v4 + My tweak (first_null_attr immediately returns 0 when natts == 0)
Overall, v4 shows significant improvements across most configuration combinations. In the best case, v4 is about 43% faster than master.
The tweak version is only slightly faster than v4. In the best case, the tweak achieves an additional ~3.5% improvement over v4.
Note that the MacBook is my working laptop. I didn’t actively work on it while the tests were running, but it was still not fully idle, as some other applications (Email, VScode, etc.) were running in the background. That said, I suppose this is still fair for the three rounds of test runs.
See the attached Excel sheet for details.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| Attachment | Content-Type | Size |
|---|---|---|
| pgbench_comparison_chao_li_mac_m4.xlsx | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | 15.8 KB |
| unknown_filename | text/plain | 1 byte |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-01-23 05:31:03 | Re: Add WALRCV_CONNECTING state to walreceiver |
| Previous Message | Chao Li | 2026-01-23 04:51:35 | Re: Assert the timestamp is available for ORIGN_DIFFERS conflicts |