Quick Links

Re: Use merge-based matching for MCVs in eqjoinsel

From:	David Geier <geidav(dot)pg(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Subject:	Re: Use merge-based matching for MCVs in eqjoinsel
Date:	2025-09-08 10:08:59
Message-ID:	5c684d71-f618-4f4b-bd1b-608c0ae3f6d7@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Ilia!

On 05.09.2025 16:03, David Geier wrote:
>>> I propose an optimization: when the column datatype supports
>>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>>> mege-style algorithm to detect matches. This reduces runtime from
>>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>>> also applies the same optimization to semi-join clauses, which show
>>> similar performance behavior.
>
> Why do you sort both lists and then merge instead of putting the smaller
> list into a hash map and then doing hash lookups (if the type is hashable)?

I've tested your and my code with the following script:

CREATE TABLE foo(col TEXT);
CREATE TABLE bar(col TEXT);
SET default_statistics_target = 10000;

-- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
-- only a single value or every value has exactly the same cardinality.
DO $$
BEGIN
FOR i IN 1..10000 LOOP
FOR j IN 1..least(i, 50) LOOP
INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
END LOOP;
END LOOP;
END;
$$;

ANALYZE foo, bar;
\timing on
EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;

Results are:

- master: 433 ms
- Order+Merge: 11 ms
- Hash map: 4 ms

I've attached my draft patch.

--
David Geier

Attachment	Content-Type	Size
0001-Optimize-eqoinsel_inner-with-hash-table.patch	text/x-patch	7.8 KB

In response to

Re: Use merge-based matching for MCVs in eqjoinsel at 2025-09-05 14:03:02 from David Geier

Responses

Re: Use merge-based matching for MCVs in eqjoinsel at 2025-09-08 10:35:50 from Ilia Evdokimov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Adrien Nayrat	2025-09-08 10:32:57	Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
Previous Message	Andrei Lepikhov	2025-09-08 10:05:03	Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6