Quick Links

Re: [PoC] Implementation of distinct in Window Aggregates

From:	Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PoC] Implementation of distinct in Window Aggregates
Date:	2022-12-29 15:28:58
Message-ID:	2ef6b491-1946-b606-f064-d9ea79d91463@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 24/12/22 18:22, Ankit Pandey wrote:
> Hi,
>
> This is a PoC patch which implements distinct operation in window
> aggregates (without order by and for single column aggregation, final
> version may vary wrt these limitations). Purpose of this PoC is to get
> feedback on the approach used and corresponding implementation, any
> nitpicking as deemed reasonable.
>
> Distinct operation is mirrored from implementation in nodeAgg.
> Existing partitioning logic determines if row is in partition and when
> distinct is required, all tuples for the aggregate column are stored
> in tuplesort. When finalize_windowaggregate gets called, tuples are
> sorted and duplicates are removed, followed by calling the transition
> function on each tuple.
> When distinct is not required, the above process is skipped and the
> transition function gets called directly and nothing gets inserted
> into tuplesort.
> Note: For each partition, in tuplesort_begin and tuplesort_end is
> involved to rinse tuplesort, so at any time, max tuples in tuplesort
> is equal to tuples in a particular partition.
>
> I have verified it for interger and interval column aggregates (to
> rule out obvious issues related to data types).
>
> Sample cases:
>
> create table mytable(id int, name text);
> insert into mytable values(1, 'A');
> insert into mytable values(1, 'A');
> insert into mytable values(5, 'B');
> insert into mytable values(3, 'A');
> insert into mytable values(1, 'A');
>
> select avg(distinct id) over (partition by name) from mytable;
> avg
> --------------------
> 2.0000000000000000
> 2.0000000000000000
> 2.0000000000000000
> 2.0000000000000000
> 5.0000000000000000
>
> select avg(id) over (partition by name) from mytable;
> avg
> --------------------
> 1.5000000000000000
> 1.5000000000000000
> 1.5000000000000000
> 1.5000000000000000
> 5.0000000000000000
>
> select avg(distinct id) over () from mytable;
> avg
> --------------------
> 3.0000000000000000
> 3.0000000000000000
> 3.0000000000000000
> 3.0000000000000000
> 3.0000000000000000
>
> select avg(distinct id) from mytable;
> avg
> --------------------
> 3.0000000000000000
>
> This is my first-time contribution. Please let me know if anything can be
> improved as I`m eager to learn.
>
> Regards,
> Ankit Kumar Pandey

Hi all,

I know everyone is busy with holidays (well, Happy Holidays!) but I will
be glad if someone can take a quick look at this PoC and share thoughts.

This is my first time contribution so I am pretty sure there will be
some very obvious feedbacks (which will help me to move forward with
this change).

--
Regards,
Ankit Kumar Pandey

In response to

[PoC] Implementation of distinct in Window Aggregates at 2022-12-24 12:52:03 from Ankit Pandey

Responses

Re: [PoC] Implementation of distinct in Window Aggregates at 2023-01-04 12:40:32 from Ankit Kumar Pandey

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2022-12-29 17:15:55	Re: split TOAST support out of postgres.h
Previous Message	Amit Kapila	2022-12-29 13:24:53	Re: Perform streaming logical transactions by background workers and parallel apply