From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Use merge-based matching for MCVs in eqjoinsel |
Date: | 2025-07-21 13:55:56 |
Message-ID: | 20ea8bf5-3569-4e46-92ef-ebb2666debf6@tantorlabs.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
While analyzing planner performance on JOB with
default_statistics_target = 1000, I noticed that a significant portion
of planning time is spent inside the eqjoinsel() function. According to
perf, in most JOB queries at default_statistics_target = 1000,
eqjoinsel() is the most expensive function during planning, accounting
for approximately 8% of total CPU time. At default_statistics_target =
10000, the planner spend up to 75% of its time inside eqjoinsel(),
making it one of the primary bottlenecks.
Еhis overhead is caused by the O(N^2) nested-loop comparison of MCVs in
var1 = var2 clauses.
I propose an optimization: when the column datatype supports
ordering(i.e., has < and >), we can sort both MCV lists and apply
mege-style algorithm to detect matches. This reduces runtime from O(N^2)
to O(NlogN), where N is the number of MCV entries. The patch also
applies the same optimization to semi-join clauses, which show similar
performance behavior.
On JOB, this changes reduce planner time in most queries with complex
joins and large MCVs with no observable effect on plan quality. I’ve
also attached bar charts showing per-query planner time before and after
the patch for default_statistics_target = 100, 1000, 10000 along with
query numbers for reference.
Any feedback or suggestions are welcome!
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
Attachment | Content-Type | Size |
---|---|---|
JOB_results.zip | application/zip | 370.8 KB |
v1-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patch | text/x-patch | 16.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Mike Artz | 2025-07-21 14:18:43 | Re: Proposal: QUALIFY clause |
Previous Message | Robert Haas | 2025-07-21 13:40:02 | Re: Test instability when pg_dump orders by OID |