| From: | Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com> |
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com> |
| Subject: | [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types |
| Date: | 2026-01-21 14:13:56 |
| Message-ID: | C71ABD3E-93C7-41A5-99AC-77279557D36A@Outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
`compute_distinct_stats()` is used for data types that have an equality
operator but no ordering (so we can't use the sort-based path). It
keeps a `track[]` array of candidate MCVs and, for each sampled row,
searches that array for a match. With high `statistics_target`, that
becomes O(n) per sample and can dominate ANALYZE time.
This patch keeps the existing behavior but adds a fast lookup path:
- When `attstattarget >= 100`, and the type's default hash support
matches the equality operator, build a small `simplehash` table mapping
a tracked Datum to its current `track[]` slot (average O(1) match
lookup).
- Otherwise, fall back to the existing linear scan.
While here, the singleton-eviction logic changes from repeatedly
shifting the count=1 region to a simple round-robin cursor over that
region. This keeps replacement O(1) and (when hashing is enabled)
avoids having to update many hash->index mappings.
Performance
I used a small harness focusing on this code path (xid and aclitem, with
match-heavy / unique-heavy / Zipf-ish distributions).
Setup:
- MacBook Pro (Apple M4 Max, 36GB RAM), macOS 26.1 (Darwin 25.1.0)
- Reported numbers are median of 5 runs
- For the "after" numbers below, I set the hash threshold to 0 to show
the potential benefit; the patch as attached enables hashing only for
`attstattarget >= 100`.
Results (ms; before -> after):
obj | statistics_target | before_ms | after_ms | speedup_x
----+------------------+----------+---------+---------
bench_aclitem.x_match | 25 | 174.154 | 182.346 | 0.96
bench_aclitem.x_match | 50 | 55.708 | 55.693 | 1.00
bench_aclitem.x_match | 100 | 74.311 | 63.978 | 1.16
bench_aclitem.x_match | 200 | 143.621 | 86.790 | 1.65
bench_aclitem.x_match | 500 | 462.616 | 125.083 | 3.70
bench_aclitem.x_match | 1000 | 1590.918 | 202.849 | 7.84
bench_aclitem.x_match | 2000 | 5857.253 | 406.840 | 14.40
bench_aclitem.x_match | 5000 | 30844.177 | 1678.826 | 18.37
bench_aclitem.x_match | 10000 | 73134.141 | 9207.071 | 7.94
bench_aclitem.x_unique | 25 | 18.214 | 17.962 | 1.01
bench_aclitem.x_unique | 50 | 39.305 | 37.045 | 1.06
bench_aclitem.x_unique | 100 | 76.555 | 64.800 | 1.18
bench_aclitem.x_unique | 200 | 151.416 | 90.619 | 1.67
bench_aclitem.x_unique | 500 | 497.243 | 134.351 | 3.70
bench_aclitem.x_unique | 1000 | 1788.755 | 209.299 | 8.55
bench_aclitem.x_unique | 2000 | 7249.638 | 332.865 | 21.78
bench_aclitem.x_unique | 5000 | 50785.046 | 604.944 | 83.95
bench_aclitem.x_unique | 10000 | 195506.022 | 792.658 | 246.65
bench_aclitem.x_zipf | 25 | 17.451 | 19.770 | 0.88
bench_aclitem.x_zipf | 50 | 37.131 | 35.560 | 1.04
bench_aclitem.x_zipf | 100 | 76.053 | 67.494 | 1.13
bench_aclitem.x_zipf | 200 | 145.435 | 100.484 | 1.45
bench_aclitem.x_zipf | 500 | 429.574 | 131.647 | 3.26
bench_aclitem.x_zipf | 1000 | 1313.187 | 223.918 | 5.86
bench_aclitem.x_zipf | 2000 | 4117.637 | 445.521 | 9.24
bench_aclitem.x_zipf | 5000 | 14863.325 | 1189.886 | 12.49
bench_aclitem.x_zipf | 10000 | 31706.434 | 2269.572 | 13.97
bench_xid.x_match | 25 | 22.136 | 56.767 | 0.39
bench_xid.x_match | 50 | 42.182 | 40.947 | 1.03
bench_xid.x_match | 100 | 81.089 | 67.848 | 1.20
bench_xid.x_match | 200 | 118.315 | 80.993 | 1.46
bench_xid.x_match | 500 | 403.674 | 120.421 | 3.35
bench_xid.x_match | 1000 | 1246.068 | 171.385 | 7.27
bench_xid.x_match | 2000 | 4374.122 | 406.823 | 10.75
bench_xid.x_match | 5000 | 21961.532 | 1903.715 | 11.54
bench_xid.x_match | 10000 | 55453.808 | 11035.402 | 5.03
bench_xid.x_unique | 25 | 21.062 | 20.011 | 1.05
bench_xid.x_unique | 50 | 40.473 | 41.414 | 0.98
bench_xid.x_unique | 100 | 79.711 | 67.382 | 1.18
bench_xid.x_unique | 200 | 125.255 | 72.853 | 1.72
bench_xid.x_unique | 500 | 412.332 | 96.030 | 4.29
bench_xid.x_unique | 1000 | 1373.882 | 144.837 | 9.49
bench_xid.x_unique | 2000 | 5084.898 | 277.216 | 18.34
bench_xid.x_unique | 5000 | 31412.454 | 574.693 | 54.66
bench_xid.x_unique | 10000 | 126639.495 | 804.947 | 157.33
bench_xid.x_zipf | 25 | 21.915 | 20.758 | 1.06
bench_xid.x_zipf | 50 | 41.458 | 39.498 | 1.05
bench_xid.x_zipf | 100 | 78.128 | 70.466 | 1.11
bench_xid.x_zipf | 200 | 118.361 | 79.816 | 1.48
bench_xid.x_zipf | 500 | 351.440 | 114.009 | 3.08
bench_xid.x_zipf | 1000 | 1049.475 | 173.981 | 6.03
bench_xid.x_zipf | 2000 | 3162.409 | 404.565 | 7.82
bench_xid.x_zipf | 5000 | 11170.524 | 1346.067 | 8.30
bench_xid.x_zipf | 10000 | 23928.836 | 2549.381 | 9.39
The bench harness (bench/) is attached for reproduction.
Patch attached.
--
Best regards,
Chengpeng Yan
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-ANALYZE-speed-up-MCV-tracking-for-equality-only-t.patch | application/octet-stream | 9.9 KB |
| bench.zip | application/zip | 4.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2026-01-21 14:19:17 | Re: AIX support |
| Previous Message | Peter Eisentraut | 2026-01-21 13:55:36 | Re: Remove more leftovers of AIX support |