| From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
|---|---|
| To: | Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com> |
| Subject: | Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types |
| Date: | 2026-01-21 20:44:08 |
| Message-ID: | c812c68c-19bd-4965-adaf-3f017b8c40db@tantorlabs.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Chengpeng,
On 21.01.2026 17:13, Chengpeng Yan wrote:
> `compute_distinct_stats()` is used for data types that have an equality
> operator but no ordering (so we can't use the sort-based path). It
> keeps a `track[]` array of candidate MCVs and, for each sampled row,
> searches that array for a match. With high `statistics_target`, that
> becomes O(n) per sample and can dominate ANALYZE time.
Nice catch - this is indeed not first time we run into an O(N^2)
bottleneck in MCV array and address it with hash-based lookup. Thanks
for working on this!
I have a couple of follow-up questions after a quick look at the patch:
1. The hash table is created but I do not see a corresponding destroy call.
2. In the original code, when a value was not found using the
nested-loop search, the singleton (count = 1) region was shifted to make
room for the new entry. After the patch, I no longer see this shifting
logic. I might be missing something, but it looks like the non-hash path
now follows the same replacement behavior as the hash-based one.
I’ll need a bit more time for a deeper review and some local benchmarks,
but overall the approach looks interesting.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Marcos Magueta | 2026-01-21 20:44:27 | Re: WIP - xmlvalidate implementation from TODO list |
| Previous Message | Corey Huinker | 2026-01-21 20:42:21 | Re: Extended Statistics set/restore/clear functions. |