Re: Use merge-based matching for MCVs in eqjoinsel

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Use merge-based matching for MCVs in eqjoinsel
Date: 2025-11-19 02:19:46
Message-ID: 1371403.1763518786@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Thinking a bit harder, we are comparing these costs:
> [ theoretical arguments trimmed ]

I spent some effort on actually measuring timings of the v6 patch,
and concluded that this is all splitting hairs that we don't need
to split. The actual crossover between hash-loses and hash-wins
is more than what my theoretical argument suggested, but still
probably less than 100 MCVs on each side. I think we should go with

(sslot1.nvalues + sslot2.nvalues) >= 200

and call it good.

To arrive at this result, I built the v6 patchset with
EQJOINSEL_MCV_HASH_THRESHOLD changed to either 0 (to force hashing)
or 1000000 (to prevent it). I then ran the attached scripts with
different values of "nstats" and collected timings from the postmaster
log output produced by the 0001 patch.

The scripts are designed to test both the cheap-comparisons scenario
(integer columns) and the expensive-comparisons scenario (text columns
with a case-insensitive ICU collation). My motivation for splitting
them into a setup and a test step was to allow the tests to be run
repeatedly against the same underlying data. (Although I soon realized
that because VACUUM ANALYZE takes a random sample each time, the stats
we're working from aren't totally the same each time anyway.) Also
you'll notice that the test data is based on log(random()), which
I did to roughly approximate a zipfian distribution. If you remove
the log() call you'll get a flat distribution instead, but it didn't
seem to change the conclusions much.

regards, tom lane

Attachment Content-Type Size
eqseltest1-log.sql text/plain 678 bytes
eqseltest2.sql text/plain 579 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-11-19 02:28:35 Re: PRI?64 vs Visual Studio (2022)
Previous Message Richard Guo 2025-11-19 02:18:23 Re: Fix typos in ExecChooseHashTableSize()