| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Fixing some ancient errors in hash join costing |
| Date: | 2025-12-29 00:39:38 |
| Message-ID: | 705AC80B-8E3A-4DA2-9FC2-A05C4D6C70FD@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Dec 29, 2025, at 06:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
>> I was amused to notice that the postgres_fdw.out change made in my
>> patch reverts one made in aa86129e1 (which also affected semijoin
>> costing). So we've had trouble before with that test case being
>> fundamentally unstable. I wonder if we shouldn't do something to try
>> to stabilize it? I see that the test immediately before this one
>> forces the matter by turning off enable_sort (which'd affect only
>> the local side not the remote). That's a hack all right but maybe
>> we should extend it to this test.
>
> Here's a v2 patchset that splits that out as a separate change for
> clarity's sake. I also spent a bit of effort on commit log messages,
> including researching the git history.
>
> regards, tom lane
>
Just a few small comments on v2:
1 - 0001
```
-SET enable_hashjoin TO off;
+-- These next tests require choosing between remote and local sort, which is
+-- a coin flip so long as cost_sort() gives the same results on both sides.
+-- To stabilize the expected plans, disable sorting locally.
SET enable_sort TO off;
-- subquery using stable function (can't be sent to remote)
+SET enable_hashjoin TO off; -- this one needs even more help to be stable
```
This is not a concern, just curious why switch the setting order of enable_hashjoin and enable_sort?
2 - 0002
```
+ else if (get_attstatsslot(&sslot, vardata.statsTuple,
+ STATISTIC_KIND_HISTOGRAM, InvalidOid,
+ 0))
+ {
+ /*
+ * If there are no recorded MCVs, but we do have a histogram, then
+ * assume that ANALYZE determined that the column is unique.
+ */
+ if (vardata.rel && vardata.rel->rows > 0)
+ *mcv_freq = 1.0 / vardata.rel->rows;
+ free_attstatsslot(&sslot);
+ }
```
As flag 0 is passed to get_attstatsslot(), free_attstatsslot() is not needed. The function comment of get_attstatsslot states about that:
```
* Passing flags=0 can be useful to quickly check if the requested slot type
* exists. In this case no arrays are extracted, so free_attstatsslot need
* not be called.
```
3 - In funciton estimate_hash_bucket_stats(), when mcv_freq is initialized:
```
/* Look up the frequency of the most common value, if available */
*mcv_freq = 0.0;
```
Maybe worth adding a short comment like “0.0 doesn’t mean zero frequency, instead 0.0 means no data or unknown frequency”, which might help code readers to quickly understand the logic.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2025-12-29 00:52:07 | Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE |
| Previous Message | Michael Paquier | 2025-12-29 00:24:29 | Re: Regression with large XML data input |