| From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | David Geier <geidav(dot)pg(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Use merge-based matching for MCVs in eqjoinsel |
| Date: | 2025-11-10 14:20:42 |
| Message-ID: | f8e6b122-b598-4adb-b91e-89fb3cdc1db5@tantorlabs.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Thanks for the detailed feedback!
On 04.11.2025 00:55, Tom Lane wrote:
> Hmm. Those results sure look like there is a performance regression
> up to at least 100 MCVs ... not a large one, but consistently a few
> percent. That's a bit sad for a patch purporting to improve
> performance. It looks to me like perhaps we should stick to the old
> algorithm up to 100 or possibly even more MCVs. Certainly the
> threshold needs to be higher than 1, as you have it now.
I re-ran the benchmark on JOB with a threshold of 100.Here are the
updated results:
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
1 | 1.00 | 2320.412 |
2318.377
5 | 0.99 | 2335.894 |
2360.890
10 | 1.00 | 2350.612 |
2347.154
25 | 1.01 | 2365.977 |
2342.312
50 | 0.99 | 2381.554 |
2405.262
75 | 1.00 | 2396.481 |
2399.828
100 | 1.00 | 2410.367 |
2412.456
1000 | 1.11 | 2850.853 |
2564.303
2500 | 2.04 | 5571.688 |
2731.545
5000 | 6.05 | 18850.084 |
3114.692
7500 | 11.96 | 39160.898 |
3273.688
10000 | 19.04 | 71334.113 |
3745.955
> I eyeballed the patch itself very briefly, and have a couple
> quick comments:
>
> * Is hash_msv a typo for hash_mcv? If not, maybe spell out what
> it's supposed to mean.
Yes, that was a typo — fixed.
> * The patch would be easier to read if it didn't reindent a couple
> large chunks of existing code. Can we change the factorization
> to avoid that? If not, I'd recommend submitting without that
> reindentation, and reminding the committer to reindent at the last
> moment.
Fixed as well. I’ve removed all reindentation changes. I will keep that
in mind for future submissions.
> * The calculation loops in eqjoinsel_inner and eqjoinsel_semi
> are not identical, which makes it look quite weird to be
> writing just one function that conditionally replaces both.
> I wonder if we should refactor to have just one copy (and
> just eat the extra cycles of calculating matchprodfreq).
Agreed. I’ve dropped the attempt to merge them into a single function.
>
> * In fact ... I wonder if we should try harder to not do essentially
> identical calculations twice, remembering that eqjoinsel_semi is
> always used alongside eqjoinsel_inner. (Of course, we could only do
> that if clamped_nvalues2 is the same as sslot2->nvalues, but that's
> frequently true.) I think the reason it's duplicative right now
> is that we regarded this semijoin calculation as an experiment and
> so didn't want to invest a lot of effort in it ... but this patch
> is exactly a lot of effort, so maybe it's time to deal with that
> unfinished business.
>
> regards, tom lane
Good point. I addressed this in a separate patch: eqjoinsel_inner() now
saves matchfreq1, matchfreq2, nmatches so that eqjoinsel_semi() can
reuse them when (clamped_nvalues2 == sslot2->nvalues). If the MCV list
on the RHS is clamped, we still recompute locally. If you have a cleaner
idea for how to share these values between the two functions without
passing them explicitly, I’d be happy to consider it.
I’m attaching two patches:
1. v4-0001-Avoid-duplicate-MCV-matching-in-eqjoinsel_semi-an.patch -
removes redundant MCV matching for semi/anti joins;
2. v4-0002-Optimize-MCV-matching-in-eqjoinsel_inner-and-eqjo.patch -
adds hash-based MCV matching with a configurable threshold and includes
fixes based on your comments.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Avoid-duplicate-MCV-matching-in-eqjoinsel_semi-an.patch | text/x-patch | 5.2 KB |
| v4-0002-Optimize-MCV-matching-in-eqjoinsel_inner-and-eqjo.patch | text/x-patch | 11.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Daniel Gustafsson | 2025-11-10 14:21:13 | Re: Minor adjustment: Update the range of the commit_siblings parameter. |
| Previous Message | Heikki Linnakangas | 2025-11-10 14:17:49 | Re: Move SLRU_PAGES_PER_SEGMENT to pg_config_manual.h |