From: | wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
---|---|
To: | Andy Fan <zhihuifan1213(at)163(dot)com> |
Cc: | Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Pathify RHS unique-ification for semijoin planning |
Date: | 2025-05-22 13:51:32 |
Message-ID: | CAGjGUAJ79-dCypHKpi+yzQkSVi9b+EDK0+BQ2VLjEw6W-5a_eQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 22 May 2025 at 17:28, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
>
> Hi,
>
> > However, in the case of sort-based implementation,
> > this function pays no attention to the subpath's pathkeys or the
> > pathkeys of the resulting output.
>
> Good finding!
>
> >
> > In addition to this specific issue, it seems to me that there are
> > other potential issues in create_unique_path().
> >
> > * Currently, we only consider the cheapest_total_path of the RHS when
> > unique-ifying it.
>
> I think it is better have a check the tuple_fraction for the startup_cost
> factor, for some paths where the total cost is high but the required
> fraction is lower.
>
> > I think this may cause us to miss some optimization
> > opportunities. For example, there might be a path with a better sort
> > order that isn't the cheapest-total one. Such a path could help avoid
> > a sort at a higher level, potentially resulting in a cheaper overall
> > plan.
>
> > * In create_unique_path(), we currently rely on heuristics to decide
> > whether to use a hash-based or sort-based method. I think a better
> > approach would be to create paths for both methods and let add_path()
> > determine which one is better, similar to how we handle path selection
> > in other parts of the planner.
> >
> > Therefore, I'm thinking that maybe we could create a new RelOptInfo
> > for the RHS rel to represent its unique-ified version, and then
> > generate all worthwhile paths for it,
>
> This sounds great for me. and I think we can keep the fraction
> cheapest path on the new RelOptInfo as well, then all the things should
> be on the way.
>
> > To be concrete, I'm envisioning something like the following:
> >
> > if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
> > - create_unique_path(root, rel2, rel2->cheapest_total_path,
> > - sjinfo) != NULL)
> > + (rel2_unique = create_unique_rel(root, rel2, sjinfo)) !=
> NULL)
> >
> > ...
> >
> > - add_paths_to_joinrel(root, joinrel, rel1, rel2,
> > - JOIN_UNIQUE_INNER, sjinfo,
> > + add_paths_to_joinrel(root, joinrel, rel1, rel2_unique,
> > + JOIN_INNER, sjinfo,
> > restrictlist);
> > - add_paths_to_joinrel(root, joinrel, rel2, rel1,
> > - JOIN_UNIQUE_OUTER, sjinfo,
> > + add_paths_to_joinrel(root, joinrel, rel2_unique, rel1,
> > + JOIN_INNER, sjinfo,
> > restrictlist);
> >
> > In addition, by changing the join from "rel1" and "rel2" using
> > JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER to a join between "rel1" and
> > "rel2_unique" using a standard JOIN_INNER, we might be able to get
> > rid of the JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER jointypes.
>
> >>if we can, +10.
Agree
Pls kindly release a path for this?
>
>
>
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2025-05-22 14:02:05 | Re: [PoC] XMLCast (SQL/XML X025) |
Previous Message | Robert Haas | 2025-05-22 13:50:22 | Re: generic plans and "initial" pruning |