Re: Trying to pull up EXPR SubLinks

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Trying to pull up EXPR SubLinks
Date: 2020-03-02 08:21:18
Message-ID: CAMbWs4-Je0y28o29ExX9qMovOO4xXmuAWJSDwphkpeXXX+cv-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 28, 2020 at 11:35 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > On Fri, Feb 28, 2020 at 3:02 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
> wrote:
> >> Glad to see this. I think the hard part is this transform is not
> *always*
> >> good. for example foo.a only has 1 rows, but bar has a lot of rows, if
> >> so the original would be the better one.
>
> > Yes exactly. TBH I'm not sure how to achieve that.
>
> Yeah, I was about to make the same objection when I saw Andy already had.
> Without some moderately-reliable way of estimating whether the change
> is actually a win, I think we're better off leaving it out. The user
> can always rewrite the query for themselves if the grouped implementation
> would be better -- but if the planner just does it blindly, there's no
> recourse when it's worse.
>

Yes, that makes sense.

>
> > Any ideas on this part?
>
> I wonder whether it'd be possible to rewrite the query, but then
> consider two implementations, one where the equality clause is
> pushed down into the aggregating subquery as though it were LATERAL.
> You'd want to be able to figure out that the presence of that clause
> made it unnecessary to do the GROUP BY ... but having done so, a
> plan treating the aggregating subquery as LATERAL ought to be pretty
> nearly performance-equivalent to the current way. So this could be
> mechanized in the current planner structure by treating that as a
> parameterized path for the subquery, and comparing it to unparameterized
> paths that calculate the full grouped output.
>

I suppose this would happen in/around function set_subquery_pathlist.
When we generate access paths for the subquery, we try to push down the
equality clause into subquery, remove the unnecessary GROUP BY, etc.
and then perform another run of subquery_planner to generate the
parameterized path, and add it to the RelOptInfo for the subquery. So
that we can do comparison to unparameterized paths.

Am I understanding it correctly?

>
> Obviously it'd be a long slog from here to there, but it seems like
> maybe that could be made to work. There's a separate question about
> whether it's really worth the trouble, seeing that the optimization
> is available today to people who rewrite their queries.
>

If I understand correctly as above, yes, this would take quite a lot of
effort. Not sure if it's still worth doing.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-03-02 08:29:30 Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
Previous Message Peter Eisentraut 2020-03-02 07:57:33 Re: Cleanup - Removal of apply_typmod function in #if 0