Re: Use merge-based matching for MCVs in eqjoinsel

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Use merge-based matching for MCVs in eqjoinsel
Date: 2025-09-05 14:03:02
Message-ID: b6316b99-565b-4c89-aa08-6aea51f54526@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ilia!

On 29.07.2025 16:07, Ilia Evdokimov wrote:
>
> On 21.07.2025 16:55, Ilia Evdokimov wrote:
>>
>> While analyzing planner performance on JOB with
>> default_statistics_target = 1000, I noticed that a significant portion
>> of planning time is spent inside the eqjoinsel() function. According
>> to perf, in most JOB queries at default_statistics_target = 1000,
>> eqjoinsel() is the most expensive function during planning, accounting
>> for approximately 8% of total CPU time. At default_statistics_target =
>> 10000, the planner spend up to 75% of its time inside eqjoinsel(),
>> making it one of the primary bottlenecks.
>>
>> This overhead is caused by the O(N^2) nested-loop comparison of MCVs
>> in var1 = var2 clauses.

Thanks for working on this. I've wanted to submit a patch for the very
same issue for a while. I've come across this issue multiple times in
the field.

>> I propose an optimization: when the column datatype supports
>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>> mege-style algorithm to detect matches. This reduces runtime from
>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>> also applies the same optimization to semi-join clauses, which show
>> similar performance behavior.

Why do you sort both lists and then merge instead of putting the smaller
list into a hash map and then doing hash lookups (if the type is hashable)?

There are more problems like this in the planner. For example col IN
(many values) is also quadratic because for every value in the IN list
all MCVs are checked. It would be great to fix this as well.

--
David Geier

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-09-05 14:08:10 Re: Appetite for syntactic sugar to match result set columns to UDT fields?
Previous Message Ashutosh Bapat 2025-09-05 13:21:23 Re: Improve pg_sync_replication_slots() to wait for primary to advance