From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Use merge-based matching for MCVs in eqjoinsel |
Date: | 2025-09-08 10:56:07 |
Message-ID: | 412007b6-a324-44f5-91ed-c869004854ec@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Ilia!
> I have read all the previous messages - and yes, you are right. I don’t
> know why I didn’t consider using a hash table approach initially. Your
> idea makes a lot of sense.
Your solution would be beneficial on top, for cases where the data type
is not hashable. But I think that's overkill for a v1. I would start
with the hash-based version.
> To evaluate it, I ran benchmarks on JOB with three variants:
>
> $ ./benchmark.sh master
> $ ./benchmark.sh merge
> $ ./benchmark.sh hash
>
> I compared total planning time across all 113 queries.
Was this running with optimizations? How did you extract the planning time?
>
> $ python3 planning_time.py master hash
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100 | 1.00 | 1892.627 |
> 1886.969
> 1000 | 1.09 | 2286.922 |
> 2100.099
> 2500 | 1.94 | 4647.167 |
> 2400.711
> 5000 | 6.15 | 17964.779 |
> 2919.914
> 7500 | 10.58 | 38622.443 |
> 3650.375
> 10000 | 16.33 | 69538.085 |
> 4257.864
>
> $ python3 planning_time.py master merge
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100 | 1.00 | 1892.627 |
> 1898.622
> 1000 | 1.12 | 2286.922 |
> 2033.553
> 2500 | 1.92 | 4647.167 |
> 2423.552
> 5000 | 5.94 | 17964.779 |
> 3025.739
> 7500 | 10.48 | 38622.443 |
> 3684.262
> 10000 | 16.72 | 69538.085 |
> 4159.418
I would have expected the delta between the "merge" and "hash" variant
to be bigger, especially for default_statistics_target=10000. My small
test also showed that. Any idea why this is not showing in your results?
> Based on these results, I’d prefer the hash lookup implementation, so I
> think it makes sense to improve your patch further and bring it into
> good shape. Shall I take care of that, or would you prefer to do it
> yourself?
I think process-wise it's best if you review my code and I do the changes.
Could you as part of your review test tables with just a few MCVs to
make sure we're not regressing "small" cases? For now I'm only bailing
if one of the two MCV lists has just a single value. I'm expecting the
gains from fine tuning this value to be not measurable but let's double
check.
--
David Geier
From | Date | Subject | |
---|---|---|---|
Next Message | shveta malik | 2025-09-08 10:57:48 | Re: Conflict detection for update_deleted in logical replication |
Previous Message | Ilia Evdokimov | 2025-09-08 10:45:06 | Re: Use merge-based matching for MCVs in eqjoinsel |