From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Use merge-based matching for MCVs in eqjoinsel |
Date: | 2025-09-05 14:03:02 |
Message-ID: | b6316b99-565b-4c89-aa08-6aea51f54526@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Ilia!
On 29.07.2025 16:07, Ilia Evdokimov wrote:
>
> On 21.07.2025 16:55, Ilia Evdokimov wrote:
>>
>> While analyzing planner performance on JOB with
>> default_statistics_target = 1000, I noticed that a significant portion
>> of planning time is spent inside the eqjoinsel() function. According
>> to perf, in most JOB queries at default_statistics_target = 1000,
>> eqjoinsel() is the most expensive function during planning, accounting
>> for approximately 8% of total CPU time. At default_statistics_target =
>> 10000, the planner spend up to 75% of its time inside eqjoinsel(),
>> making it one of the primary bottlenecks.
>>
>> This overhead is caused by the O(N^2) nested-loop comparison of MCVs
>> in var1 = var2 clauses.
Thanks for working on this. I've wanted to submit a patch for the very
same issue for a while. I've come across this issue multiple times in
the field.
>> I propose an optimization: when the column datatype supports
>> ordering(i.e., has < and >), we can sort both MCV lists and apply
>> mege-style algorithm to detect matches. This reduces runtime from
>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
>> also applies the same optimization to semi-join clauses, which show
>> similar performance behavior.
Why do you sort both lists and then merge instead of putting the smaller
list into a hash map and then doing hash lookups (if the type is hashable)?
There are more problems like this in the planner. For example col IN
(many values) is also quadratic because for every value in the IN list
all MCVs are checked. It would be great to fix this as well.
--
David Geier
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-09-05 14:08:10 | Re: Appetite for syntactic sugar to match result set columns to UDT fields? |
Previous Message | Ashutosh Bapat | 2025-09-05 13:21:23 | Re: Improve pg_sync_replication_slots() to wait for primary to advance |