Re: Use merge-based matching for MCVs in eqjoinsel

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Use merge-based matching for MCVs in eqjoinsel
Date: 2025-09-09 09:22:32
Message-ID: 1f6e18c6-9149-4b00-9837-fb8ce5304cf4@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Thanks for the review.

On 09.09.2025 09:29, Ilia Evdokimov wrote:
> From an architectural perspective, I think the patch is already in good
> shape. However, I have some suggestions regarding code style:
>
> 1. I would move McvHashEntry, McvHashContext, he new hash table
>    definition, hash_mcv and are_mcvs_equal to the top.
Done. I've also moved up try_eqjoinsel_with_hashtable().

> 2. I’m not sure get_hash_func_oid() is needed at all – it seems we
>    could do without it.
Removed.

> 3. It would be better to name the parameters matchProductFrequencies ->
>    matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1,
>    hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
Done.

> 4. As I wrote earlier, since we now have a dedicated function
>    eqjoinsel_inner_with_hashtable(), perhaps it could be used in both
>    eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based
Done.

The gains for SEMI join are even bigger because the function is called 3
times for e.g. EXPLAIN SELECT * FROM foo f WHERE EXISTS (SELECT 1 FROM
bar b where f.col = b.col); For that query the planning time for me goes
from ~1300 ms -> 12 ms.

>    search was factored out, maybe it would make sense to also factor
>    out the O(N^2) nested loop implementation?
Generally agreed and while tempting, I've refrained from doing the
refactoring. Let's better do this in a separate patch to keep things simple.

> 5. I think it would be helpful to add a comment explaining that using a
>    hash table is not efficient when the MCV array length equals 1:
>
> if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
>     return false;
Done.
>> Did you have a
>> chance to check tables with just few MCVs or are there any queries in
>> the JOB which regress with very small default_statistics_target?
>
>
> Sure. I need some time to check this.
>
Could you please do that with the latest attached patch so that we test
it once more?

Could you also run once more the JOB benchmark to get some test coverage
on the SEMI join code (assuming it also uses SEMI joins)?

Once we've concluded on above and there are no objections, I will
register this patch in the commit fest.

--
David Geier

Attachment Content-Type Size
0002-Optimize-eqjoinsel_inner-and-eqjoinsel_semi.patch text/x-patch 11.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-09-09 09:26:48 Re: Issue with logical replication slot during switchover
Previous Message Richard Guo 2025-09-09 09:20:21 Re: Eager aggregation, take 3