Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types
Date: 2026-01-21 20:44:08
Message-ID: c812c68c-19bd-4965-adaf-3f017b8c40db@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Chengpeng,

On 21.01.2026 17:13, Chengpeng Yan wrote:
> `compute_distinct_stats()` is used for data types that have an equality
> operator but no ordering (so we can't use the sort-based path).  It
> keeps a `track[]` array of candidate MCVs and, for each sampled row,
> searches that array for a match.  With high `statistics_target`, that
> becomes O(n) per sample and can dominate ANALYZE time.

Nice catch - this is indeed not first time we run into an O(N^2)
bottleneck in MCV array and address it with hash-based lookup. Thanks
for working on this!

I have a couple of follow-up questions after a quick look at the patch:

1. The hash table is created but I do not see a corresponding destroy call.
2. In the original code, when a value was not found using the
nested-loop search, the singleton (count = 1) region was shifted to make
room for the new entry. After the patch, I no longer see this shifting
logic. I might be missing something, but it looks like the non-hash path
now follows the same replacement behavior as the hash-based one.

I’ll need a bit more time for a deeper review and some local benchmarks,
but overall the approach looks interesting.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Magueta 2026-01-21 20:44:27 Re: WIP - xmlvalidate implementation from TODO list
Previous Message Corey Huinker 2026-01-21 20:42:21 Re: Extended Statistics set/restore/clear functions.