| From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Use merge-based matching for MCVs in eqjoinsel |
| Date: | 2025-11-18 17:54:16 |
| Message-ID: | 33fd5d63-19cd-4fff-b741-0e7af45df52f@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Tom!
On 17.11.2025 19:44, Tom Lane wrote:
> I wrote:
>> Actually, after sleeping on it it seems like the obvious thing is
>> to test "sslot1.nvalues * sslot2.nvalues", since the work we are
>> thinking about saving scales as that product. But I'm not sure
>> what threshold value to use if we do that. Maybe around 10000?
>
> Or maybe better, since we are considering an O(m*n) algorithm
> versus an O(m+n) one, we could check whether
>
> sslot1.nvalues * sslot2.nvalues - (sslot1.nvalues + sslot2.nvalues)
>
> exceeds some threshold. But that doesn't offer any insight into
> just what the threshold should be, either.
Good idea. How about using that formula and then determining the
threshold with a few experiments? Could be the JOB benchmark Ilia has
already set up or some synthetic test-cases.
Given that there's no one-size-fits-all constant anyways, that seems
good enough to me. Looking at [1], determining to set
MIN_ARRAY_SIZE_FOR_HASHED_SAOP to 9 was done the same way.
We could also include the operator costs for hashing and equality
comparison to make it more precise, in case they're easily accessible
at this point.
--
David Geier
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2025-11-18 17:57:33 | Re: Consistently use the XLogRecPtrIsInvalid() macro |
| Previous Message | Bryan Green | 2025-11-18 17:47:17 | Re: [PATCH] Allow complex data for GUC extra. |