Re: Use merge-based matching for MCVs in eqjoinsel

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Use merge-based matching for MCVs in eqjoinsel
Date: 2025-09-08 14:36:35
Message-ID: 9f328076-423f-40d3-8012-2d2acc05cbff@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 08.09.2025 15:45, Ilia Evdokimov wrote:
> I reran the benchmark on a clean cluster and collected the top slowest
> JOB queries — now the effect is clearly visible.
>
> Merge (sum of all JOB queries)
> ==================
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                       | *1.00*                | 1888.105           
> | 1879.431
> 1000                      | *1.14*                | 2282.239           
> | 2009.114
> 2500                      | *2.10*                | 5595.030           
> | 2668.530
> 5000                      | *5.56*                | 18544.933          
> | 3333.252
> 7500                      | *9.17*                | 37390.956          
> | 4076.390
> 10000                     | *16.10*               | 69319.479          
> | 4306.417
>
> HashMap (sum of all JOB queries)
> ==================
> default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
> Planner After (ms)
> --------------------------------------------------------------------------------
> 100                     | *1.03*                | 1888.105            |
> 1828.088
> 1000                    | *1.18*                | 2282.239            |
> 1939.884
> 2500                    | *2.64*                | 5595.030            |
> 2117.872
> 5000                    | *7.80*                | 18544.933           |
> 2377.206
> 7500                    | *13.80*               | 37390.956           |
> 2709.973
> 10000                   | *23.32*               | 69319.479           |
> 2973.073
>
> Top 10 slowest JOB queries (default_statistics_target = 10000)
> Query | master (ms) | merge (ms) | Hash (ms)
> ------+-------------+------------+-----------
> 29c   | 1904.586    | 144.135    | 100.473
> 29b   | 1881.392    | 117.891    | 89.028
> 29a   | 1868.805    | 112.242    | 83.913
> 31c   | 1867.234    | 76.498     | 56.140
> 30c   | 1646.630    | 88.494     | 62.549
> 30b   | 1608.820    | 84.821     | 64.603
> 31a   | 1573.964    | 75.978     | 56.140
> 28a   | 1457.738    | 95.939     | 77.309
> 28b   | 1455.052    | 99.383     | 73.065
> 30a   | 1416.699    | 91.057     | 62.549

This looks much better. Very nice!

>
> BTW, the hashmap from your patch could also be applied to
> eqjoinsel_semi() function.
>

Yep. The inner loop only runs until clamped_nvalues2 and it doesn't
compute matchprodfreq. I'll try to modify the patch such that it
accounts for these differences without being too hard to read.

Do you think anything else needs changes in the patch? Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?

--
David Geier

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-09-08 15:05:16 Re: Only one version can be installed when using extension_control_path
Previous Message Robert Haas 2025-09-08 13:56:18 Re: plan shape work