From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
Cc: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andy Fan <zhihuifan1213(at)163(dot)com> |
Subject: | Re: Pathify RHS unique-ification for semijoin planning |
Date: | 2025-08-01 01:53:37 |
Message-ID: | CAMbWs49p2Ci-hokz+GP-EpYYdQEzr3uH3ApyPou+nsECXaJ_4w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 31, 2025 at 9:49 PM wenhui qiu <qiuwenhuifx(at)gmail(dot)com> wrote:
> > This seems to be another case where the planner chooses a suboptimal
> > plan due to inaccurate cost estimates.
> Agree ,I increased some rows , set enable_hashagg to on and off ,There's no difference in execution time. The execution plan is the same.
Yeah, I found that if you increase the total number of rows in the
table from 1000 to 1083 or more, you consistently get the more
efficient plan -- regardless of whether enable_hashagg is on or off.
> > #define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
> > ((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
> > bms_equal((sjinfo)->syn_righthand, (rel)->relids))
> >
> >... and then the check in final_cost_hashjoin() becomes:
> >
> > if (IS_UNIQUEIFIED_REL(inner_path->parent, extra->sjinfo,
> > path->jpath.jointype))
> > {
> > innerbucketsize = 1.0 / virtualbuckets;
> > innermcvfreq = 0.0;
> > }
> >
> >Would this be a better approach? Any thoughts?
> This approach does indeed more accurately capture the fact that the relation has been unique-ified, especially in cases where a semi join has been transformed into an inner join. Compared to the current heuristic checks in costsize.c that rely on inner_path->rows, this method is more semantically meaningful and robust.
The current check in costsize.c relies on the path type, not the path
rows as you mentioned. However, I agree that the check I proposed is
more robust and extensible: if additional path types are introduced to
represent unique-ification, this check wouldn't need to change. So I
plan to go this way, unless there are any objections.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Chao Li | 2025-08-01 01:57:20 | [PATCH v1] Trivial: ignore IDE-specific files (.vscode/) |
Previous Message | Chao Li | 2025-08-01 00:19:23 | Cannot login CommitFests due to cool-off period |