From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com> |
Cc: | Álvaro Herrera <alvherre(at)kurilemu(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andy Fan <zhihuifan1213(at)163(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
Subject: | Re: Pathify RHS unique-ification for semijoin planning |
Date: | 2025-07-31 08:58:28 |
Message-ID: | CAMbWs489K=90wFp-ocTSmVNpzFydcgJHJpOSp7q23f39_vr6=Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 31, 2025 at 1:08 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> On Thu, Jul 31, 2025 at 10:33 AM Alexandra Wang
> <alexandra(dot)wang(dot)oss(at)gmail(dot)com> wrote:
> > While looking at the code, I also had a question about the following
> > changes in costsize.c:
> >
> > --- a/src/backend/optimizer/path/costsize.c
> > +++ b/src/backend/optimizer/path/costsize.c
> > @@ -3963,7 +3963,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
> > * The whole issue is moot if we are working from a unique-ified outer
> > * input, or if we know we don't need to mark/restore at all.
> > */
> > - if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
> > + if (IsA(outer_path, UniquePath) ||
> > + IsA(outer_path, AggPath) ||
> > + path->skip_mark_restore)
> >
> > and
> >
> > @@ -4358,7 +4360,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
> > * because we avoid contaminating the cache with a value that's wrong for
> > * non-unique-ified paths.
> > */
> > - if (IsA(inner_path, UniquePath))
> > + if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))
> >
> > I'm curious why AggPath was added in these two cases.
> Well, in final_cost_hashjoin() and final_cost_mergejoin(), we have
> some special cases when the inner or outer relation has been
> unique-ified. Previously, it was sufficient to check whether the path
> was a UniquePath, since both hash-based and sort-based implementations
> were represented that way. However, with this patch, UniquePath now
> only represents the sort-based implementation, so we also need to
> check for AggPath to account for the hash-based case.
BTW, maybe a better way to determine whether a relation has been
unique-ified is to check that the nominal jointype is JOIN_INNER and
sjinfo->jointype is JOIN_SEMI, and the relation is exactly the RHS of
the semijoin. This approach is mentioned in a comment in joinpath.c:
* Path cost estimation code may need to recognize that it's
* dealing with such a case --- the combination of nominal jointype INNER
* with sjinfo->jointype == JOIN_SEMI indicates that.
... but it seems we don't currently apply it in costsize.c.
To be concrete, I'm imagining a check like the following:
#define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
... and then the check in final_cost_hashjoin() becomes:
if (IS_UNIQUEIFIED_REL(inner_path->parent, extra->sjinfo,
path->jpath.jointype))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
}
Would this be a better approach? Any thoughts?
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2025-07-31 09:07:33 | Re: 024_add_drop_pub.pl might fail due to deadlock |
Previous Message | Japin Li | 2025-07-31 08:20:00 | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |