Re: Fixing some ancient errors in hash join costing

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fixing some ancient errors in hash join costing
Date: 2025-12-29 06:19:05
Message-ID: F364A5CB-C518-4D28-8778-50F2F6BE9BDE@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Dec 29, 2025, at 11:29, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> writes:
>> Just a few small comments on v2:
>
>> This is not a concern, just curious why switch the setting order of enable_hashjoin and enable_sort?
>
> The need to disable hashjoin only applies to one of the two tests, so
> it seemed to me that this way is more nicely nested. Judgment call
> of course.

Make sense. Making enable_hashjoin closer to the test that depends on it and immediately resetting it after the test, which is clearer.

>
>> As flag 0 is passed to get_attstatsslot(), free_attstatsslot() is not needed.
>
> True. I wrote it like that so people wouldn't wonder if I'd forgotten
> free_attstatsslot(), but if other call sites passing flags == 0 don't
> use it then it'd be better to be consistent. (I didn't check that.)

I searched over the source tree, looks like not a direct reference. The only usages of flag 0 is in eqjoinsel(), the code snippet is:
```
/*
* There is no use in fetching one side's MCVs if we lack MCVs for the
* other side, so do a quick check to verify that both stats exist.
*/
get_mcv_stats = (HeapTupleIsValid(vardata1.statsTuple) &&
HeapTupleIsValid(vardata2.statsTuple) &&
get_attstatsslot(&sslot1, vardata1.statsTuple,
STATISTIC_KIND_MCV, InvalidOid,
0) &&
get_attstatsslot(&sslot2, vardata2.statsTuple,
STATISTIC_KIND_MCV, InvalidOid,
0));

if (HeapTupleIsValid(vardata1.statsTuple))
{
/* note we allow use of nullfrac regardless of security check */
stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
if (get_mcv_stats &&
statistic_proc_security_check(&vardata1, opfuncoid))
have_mcvs1 = get_attstatsslot(&sslot1, vardata1.statsTuple,
STATISTIC_KIND_MCV, InvalidOid,
ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
}

if (HeapTupleIsValid(vardata2.statsTuple))
{
/* note we allow use of nullfrac regardless of security check */
stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
if (get_mcv_stats &&
statistic_proc_security_check(&vardata2, opfuncoid))
have_mcvs2 = get_attstatsslot(&sslot2, vardata2.statsTuple,
STATISTIC_KIND_MCV, InvalidOid,
ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS);
}
```

Here sslot1 and sslot2 are called with flag 0 first, then when called again with non-0 flags, sslot1 and sslot2 are not free-ed first, which implies that free_attstatsslot() is not need to be called.

If a reader considers free_attstatsslot() is necessary in this patch, then he might concern memory leaks in eqjoinsel().

>
>> Maybe worth adding a short comment like “0.0 doesn’t mean zero frequency, instead 0.0 means no data or unknown frequency”, which might help code readers to quickly understand the logic.
>
> Doesn't the function's header comment cover that adequately?

Yep, fair point. The function comment does explain that.

My thought was just that at the point of initialization, the nearby comment reads as if we’re about to fetch a value, whereas the assignment is really initializing with “unknown so far”. A short tweak like “Initialize MCV frequency to unknown” might help make that intent obvious locally, but I’m fine either way.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message VASUKI M 2025-12-29 06:23:57 Re: [PATCH] psql: tab completion for ALTER ROLE ... IN DATABASE ...
Previous Message shveta malik 2025-12-29 06:02:38 Re: Proposal: Conflict log history table for Logical Replication