Statistical aggregate functions are not working with PARTIAL aggregation

From: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Statistical aggregate functions are not working with PARTIAL aggregation
Date: 2019-05-07 09:09:55
Message-ID: CAKcux6=YBMCntcafSs_22dS1ab6mGay_QUaHx-nvg+_FVPMg3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
As this issue is reproducible without partition-wise aggregate also,
changing email subject from "Statistical aggregate functions are not
working with partitionwise aggregate " to "Statistical aggregate functions
are not working with PARTIAL aggregation".

original reported test case and discussion can be found at below link.
https://www.postgresql.org/message-id/flat/CAKcux6%3DuZEyWyLw0N7HtR9OBc-sWEFeByEZC7t-KDf15FKxVew%40mail.gmail.com

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Fri, May 3, 2019 at 5:26 PM Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
wrote:

>
>
> On Fri, May 3, 2019 at 2:56 PM Rajkumar Raghuwanshi <
> rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
>
>> Hi,
>>
>> On PG-head, Some of statistical aggregate function are not giving correct
>> output when enable partitionwise aggregate while same is working on v11.
>>
>
> I had a quick look over this and observed that something broken with the
> PARTIAL aggregation.
>
> I can reproduce same issue with the larger dataset which results into
> parallel scan.
>
> CREATE TABLE tbl1(a int2,b float4) partition by range(a);
> create table tbl1_p1 partition of tbl1 for values from (minvalue) to (0);
> create table tbl1_p2 partition of tbl1 for values from (0) to (maxvalue);
> insert into tbl1 select i%2, i from generate_series(1, 1000000) i;
>
> # SELECT regr_count(b, a) FROM tbl1;
> regr_count
> ------------
> 0
> (1 row)
>
> postgres:5432 [120536]=# explain SELECT regr_count(b, a) FROM tbl1;
> QUERY
> PLAN
>
> ------------------------------------------------------------------------------------------------
> Finalize Aggregate (cost=15418.08..15418.09 rows=1 width=8)
> -> Gather (cost=15417.87..15418.08 rows=2 width=8)
> Workers Planned: 2
> -> Partial Aggregate (cost=14417.87..14417.88 rows=1 width=8)
> -> Parallel Append (cost=0.00..11091.62 rows=443500
> width=6)
> -> Parallel Seq Scan on tbl1_p2 (cost=0.00..8850.00
> rows=442500 width=6)
> -> Parallel Seq Scan on tbl1_p1 (cost=0.00..24.12
> rows=1412 width=6)
> (7 rows)
>
> postgres:5432 [120536]=# set max_parallel_workers_per_gather to 0;
> SET
> postgres:5432 [120536]=# SELECT regr_count(b, a) FROM tbl1;
> regr_count
> ------------
> 1000000
> (1 row)
>
> After looking further, it seems that it got broken by following commit:
>
> commit a9c35cf85ca1ff72f16f0f10d7ddee6e582b62b8
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: Sat Jan 26 14:17:52 2019 -0800
>
> Change function call information to be variable length.
>
>
> This commit is too big to understand and thus could not get into the
> excact cause.
>
> Thanks
>
>
>> below are some of examples.
>>
>> CREATE TABLE tbl(a int2,b float4) partition by range(a);
>> create table tbl_p1 partition of tbl for values from (minvalue) to (0);
>> create table tbl_p2 partition of tbl for values from (0) to (maxvalue);
>> insert into tbl values (-1,-1),(0,0),(1,1),(2,2);
>>
>> --when partitionwise aggregate is off
>> postgres=# SELECT regr_count(b, a) FROM tbl;
>> regr_count
>> ------------
>> 4
>> (1 row)
>> postgres=# SELECT regr_avgx(b, a), regr_avgy(b, a) FROM tbl;
>> regr_avgx | regr_avgy
>> -----------+-----------
>> 0.5 | 0.5
>> (1 row)
>> postgres=# SELECT corr(b, a) FROM tbl;
>> corr
>> ------
>> 1
>> (1 row)
>>
>> --when partitionwise aggregate is on
>> postgres=# SET enable_partitionwise_aggregate = true;
>> SET
>> postgres=# SELECT regr_count(b, a) FROM tbl;
>> regr_count
>> ------------
>> 0
>> (1 row)
>> postgres=# SELECT regr_avgx(b, a), regr_avgy(b, a) FROM tbl;
>> regr_avgx | regr_avgy
>> -----------+-----------
>> |
>> (1 row)
>> postgres=# SELECT corr(b, a) FROM tbl;
>> corr
>> ------
>>
>> (1 row)
>>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>> QMG, EnterpriseDB Corporation
>>
>
>
> --
> Jeevan Chalke
> Technical Architect, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-05-07 12:06:24 Re: Statistical aggregate functions are not working with PARTIAL aggregation
Previous Message Rafia Sabih 2019-05-07 09:03:23 Re: New EXPLAIN option: ALL