Re: [PATCH] Erase the distinctClause if the result is unique by definition

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date: 2020-03-04 13:13:54
Message-ID: CAKU4AWpOd1ALDMZCQ4hMXDzU6fE8Auuqk2sy3DKWAXsKsbJs0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
>> * There are some changes in existing regression cases that aren't
>> visibly related to the stated purpose of the patch, eg it now
>> notices that "select distinct max(unique2) from tenk1" doesn't
>> require an explicit DISTINCT step. That's not wrong, but I wonder
>> if maybe you should subdivide this patch into more than one patch,
>> because that must be coming from some separate change. I'm also
>> wondering what caused the plan change in expected/join.out.
>>
>
> Per my purpose it should be in the same patch, the logical here is we
> have distinct in the sql and the query is distinct already since the max
> function (the rule is defined in query_is_distinct_agg which is splited
> from
> the original query_is_distinct_for clause).
>

I think I was right until I come
into contrib/postgres_fdw/sql/postgres_fdw.sql.
Per my understanding, the query the result of "select max(a) from t" is
unique
since the aggregation function and has no group clause there. But in the
postgres_fdw.sql case, the Query->hasAggs is true for "select distinct
(select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where
t1.c1 = 6)
from ft2 t2 where t2.c2 % 6 = 0 order by 1;" This looks very strange to
me.
Is my understanding wrong or there is a bug here?

query->hasAggs was set to true in the following call stack.

pstate->p_hasAggs = true;

..

qry->hasAggs = pstate->p_hasAggs;

0 in check_agglevels_and_constraints of parse_agg.c:343
1 in transformAggregateCall of parse_agg.c:236
2 in ParseFuncOrColumn of parse_func.c:805
3 in transformFuncCall of parse_expr.c:1558
4 in transformExprRecurse of parse_expr.c:265
5 in transformExpr of parse_expr.c:155
6 in transformTargetEntry of parse_target.c:105
7 in transformTargetList of parse_target.c:193
8 in transformSelectStmt of analyze.c:1224
9 in transformStmt of analyze.c:301

You can see the new updated patch which should fix all the issues you point
out
except the one for supporting group by. The another reason for this patch
will
not be the final one is because the changes for postgres_fdw.out is too
arbitrary.
uploading it now just for reference. (The new introduced guc variable can
be
removed at last, keeping it now just make sure the testing is easier.)

At a high level, I'm a bit disturbed that this focuses only on DISTINCT
>>> and doesn't (appear to) have any equivalent intelligence for GROUP BY,
>>> though surely that offers much the same opportunities for optimization.
>>> It seems like it'd be worthwhile to take a couple steps back and see
>>> if we couldn't recast the logic to work for both.
>>>
>>>
>> OK, Looks group by is a bit harder than distinct since the aggregation
>> function.
>> I will go through the code to see where to add this logic.
>>
>>
>
> Can we grantee any_aggr_func(a) == a if only 1 row returned, if so, we
> can do
> some work on the pathtarget/reltarget by transforming the Aggref to raw
> expr.
> I checked the execution path of the aggregation call, looks it depends on
> Agg node
> which is the thing we want to remove.
>

We can't grantee any_aggr_func(a) == a when only 1 row returned, so the
above
method doesn't work. do you have any suggestion for this?

Attachment Content-Type Size
v2-0001-PATCH-Erase-the-distinct-path-if-the-result-is-un.patch application/x-patch 37.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arseny Sher 2020-03-04 13:29:44 Re: ERROR: subtransaction logged without previous top-level txn record
Previous Message Michael Paquier 2020-03-04 13:05:30 Re: PG_COLOR not mentioned in docs of vacuumlo, oid2name and pgbench