Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types

From: Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types
Date: 2026-01-22 09:40:38
Message-ID: FACC73B3-C4FC-4206-A887-1F3EE75AD37C@Outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

> On Jan 22, 2026, at 04:44, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> wrote:
>
> Nice catch - this is indeed not first time we run into an O(N^2) bottleneck in MCV array and address it with hash-based lookup. Thanks for working on this!

Thanks for the review. This change was actually inspired by your earlier
patch on a similar issue, so thanks a lot for that as well.

> I have a couple of follow-up questions after a quick look at the patch:
> 1. The hash table is created but I do not see a corresponding destroy call.

The hash table is allocated in the per-column temporary memory context
(the “Analyze Column” context that is active while compute_stats()
runs). That context is MemoryContextReset() after each column and
eventually deleted at the end of ANALYZE, so the table is reclaimed
automatically.

This matches the existing convention noted at the end of
compute_distinct_stats() (“We don’t need to bother cleaning up any of
our temporary palloc’s”).

That said, I can add an explicit DistinctHash_destroy() before returning
for clarity, although it shouldn’t be required for correctness or leak
prevention.

> 2. In the original code, when a value was not found using the nested-loop search, the singleton (count = 1) region was shifted to make room for the new entry. After the patch, I no longer see this shifting logic. I might be missing something, but it looks like the non-hash path now follows the same replacement behavior as the hash-based one.

In the original code, inserting a new distinct value into the singleton
(count = 1) region effectively evicted the oldest singleton (FIFO) by
inserting at the head of that region and shifting the tail, which is
O(n) per new distinct value.

The patch replaces this with a FIFO eviction cursor (effectively a
circular / round-robin cursor) over the count = 1 region, avoiding
repeated shifts. When hashing is enabled, this also avoids having to
update many hash→index mappings due to element shifts.

I kept the non-hash path using the same mechanism so that the hash and
fallback paths follow the same replacement behavior, with the intention
of preserving the original FIFO semantics while avoiding the O(n) cost.

I’ve also posted a v2 update. It adjusts the eviction cursor when a
count = 1 value is promoted (via the existing bubble-up logic), so that
the cursor-based eviction is now exactly equivalent to the original
shift-based FIFO behavior.

I’ve also updated the relevant comments to clarify this behavior. The v2
patch is attached.

--
Best regards,
Chengpeng Yan

Attachment Content-Type Size
v2-0001-ANALYZE-speed-up-MCV-tracking-for-equality-only-t.patch application/octet-stream 10.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2026-01-22 09:50:26 Re: refactor architecture-specific popcount code
Previous Message Chao Li 2026-01-22 09:38:14 Re: DOC: fixes multiple errors in alter table doc