Re: [PATCH] Erase the distinctClause if the result is unique by definition

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date: 2020-03-06 11:46:51
Message-ID: CAKU4AWqM_HM1HHVnFhYx2dhnOwV0HGOGgPtFOO4ogz1FFtLKQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Upload the newest patch so that the cfbot can pass. The last patch failed
because some explain without the (cost off).

I'm still on the way to figure out how to handle aggregation calls without
aggregation path. Probably we can get there by hacking some
ExprEvalPushStep for Aggref node. But since the current patch not tied
with this closely, so I would put this patch for review first.

On Wed, Mar 4, 2020 at 9:13 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:

>
>
>>
>>> * There are some changes in existing regression cases that aren't
>>> visibly related to the stated purpose of the patch, eg it now
>>> notices that "select distinct max(unique2) from tenk1" doesn't
>>> require an explicit DISTINCT step. That's not wrong, but I wonder
>>> if maybe you should subdivide this patch into more than one patch,
>>> because that must be coming from some separate change. I'm also
>>> wondering what caused the plan change in expected/join.out.
>>>
>>
>> Per my purpose it should be in the same patch, the logical here is we
>> have distinct in the sql and the query is distinct already since the max
>> function (the rule is defined in query_is_distinct_agg which is splited
>> from
>> the original query_is_distinct_for clause).
>>
>
> I think I was right until I come
> into contrib/postgres_fdw/sql/postgres_fdw.sql.
> Per my understanding, the query the result of "select max(a) from t" is
> unique
> since the aggregation function and has no group clause there. But in the
> postgres_fdw.sql case, the Query->hasAggs is true for "select distinct
> (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where
> t1.c1 = 6)
> from ft2 t2 where t2.c2 % 6 = 0 order by 1;" This looks very strange to
> me.
> Is my understanding wrong or there is a bug here?
>
> query->hasAggs was set to true in the following call stack.
>
> pstate->p_hasAggs = true;
>
> ..
>
> qry->hasAggs = pstate->p_hasAggs;
>
>
> 0 in check_agglevels_and_constraints of parse_agg.c:343
> 1 in transformAggregateCall of parse_agg.c:236
> 2 in ParseFuncOrColumn of parse_func.c:805
> 3 in transformFuncCall of parse_expr.c:1558
> 4 in transformExprRecurse of parse_expr.c:265
> 5 in transformExpr of parse_expr.c:155
> 6 in transformTargetEntry of parse_target.c:105
> 7 in transformTargetList of parse_target.c:193
> 8 in transformSelectStmt of analyze.c:1224
> 9 in transformStmt of analyze.c:301
>
> You can see the new updated patch which should fix all the issues you
> point out
> except the one for supporting group by. The another reason for this
> patch will
> not be the final one is because the changes for postgres_fdw.out is too
> arbitrary.
> uploading it now just for reference. (The new introduced guc variable can
> be
> removed at last, keeping it now just make sure the testing is easier.)
>
>
> At a high level, I'm a bit disturbed that this focuses only on DISTINCT
>>>> and doesn't (appear to) have any equivalent intelligence for GROUP BY,
>>>> though surely that offers much the same opportunities for optimization.
>>>> It seems like it'd be worthwhile to take a couple steps back and see
>>>> if we couldn't recast the logic to work for both.
>>>>
>>>>
>>> OK, Looks group by is a bit harder than distinct since the aggregation
>>> function.
>>> I will go through the code to see where to add this logic.
>>>
>>>
>>
>> Can we grantee any_aggr_func(a) == a if only 1 row returned, if so, we
>> can do
>> some work on the pathtarget/reltarget by transforming the Aggref to raw
>> expr.
>> I checked the execution path of the aggregation call, looks it depends on
>> Agg node
>> which is the thing we want to remove.
>>
>
> We can't grantee any_aggr_func(a) == a when only 1 row returned, so the
> above
> method doesn't work. do you have any suggestion for this?
>

Attachment Content-Type Size
v3-0001-PATCH-Erase-the-distinct-path-if-the-result-is-un.patch application/octet-stream 35.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kartyshov Ivan 2020-03-06 12:21:49 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message Arseny Sher 2020-03-06 11:02:53 Re: logical copy_replication_slot issues