Re: [PoC] Implementation of distinct in Window Aggregates

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: [PoC] Implementation of distinct in Window Aggregates
Date: 2023-03-12 07:25:48
Message-ID: 770824eb-a01e-354c-2434-7fed1dd832a4@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 04/01/23 18:10, Ankit Kumar Pandey wrote:
> On 29/12/22 20:58, Ankit Kumar Pandey wrote:
> >
> > On 24/12/22 18:22, Ankit Pandey wrote:
> >> Hi,
> >>
> >> This is a PoC patch which implements distinct operation in window
> >> aggregates (without order by and for single column aggregation, final
> >> version may vary wrt these limitations). Purpose of this PoC is to
> >> get feedback on the approach used and corresponding implementation,
> >> any nitpicking as deemed reasonable.
> >>
> >> Distinct operation is mirrored from implementation in nodeAgg.
> >> Existing partitioning logic determines if row is in partition and
> >> when distinct is required, all tuples for the aggregate column are
> >> stored in tuplesort. When finalize_windowaggregate gets called,
> >> tuples are sorted and duplicates are removed, followed by calling the
> >> transition function on each tuple.
> >> When distinct is not required, the above process is skipped and the
> >> transition function gets called directly and nothing gets inserted
> >> into tuplesort.
> >> Note: For each partition, in tuplesort_begin and tuplesort_end is
> >> involved to rinse tuplesort, so at any time, max tuples in tuplesort
> >> is equal to tuples in a particular partition.
> >>
> >> I have verified it for interger and interval column aggregates (to
> >> rule out obvious issues related to data types).
> >>
> >> Sample cases:
> >>
> >> create table mytable(id int, name text);
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(5, 'B');
> >> insert into mytable values(3, 'A');
> >> insert into mytable values(1, 'A');
> >>
> >> select avg(distinct id) over (partition by name) from mytable;
> >>         avg
> >> --------------------
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 5.0000000000000000
> >>
> >> select avg(id) over (partition by name) from mytable;
> >>         avg
> >> --------------------
> >>  1.5000000000000000
> >>  1.5000000000000000
> >>  1.5000000000000000
> >>  1.5000000000000000
> >>  5.0000000000000000
> >>
> >> select avg(distinct id) over () from mytable;
> >>         avg
> >> --------------------
> >>  3.0000000000000000
> >>  3.0000000000000000
> >>  3.0000000000000000
> >>  3.0000000000000000
> >>  3.0000000000000000
> >>
> >> select avg(distinct id)  from mytable;
> >>         avg
> >> --------------------
> >>  3.0000000000000000
> >>
> >> This is my first-time contribution. Please let me know if anything
> >> can be
> >> improved as I`m eager to learn.
> >>
> >> Regards,
> >> Ankit Kumar Pandey
> >
> > Hi all,
> >
> > I know everyone is busy with holidays (well, Happy Holidays!) but I
> > will be glad if someone can take a quick look at this PoC and share
> > thoughts.
> >
> > This is my first time contribution so I am pretty sure there will be
> > some very obvious feedbacks (which will help me to move forward with
> > this change).
> >
> >
> Updated patch with latest master. Last patch was an year old.
>
Attaching patch with rebase from latest HEAD

Thanks,

Ankit

Attachment Content-Type Size
distinct_windows.patch text/x-patch 15.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ankit Kumar Pandey 2023-03-12 08:17:50 Re: [PoC] Implementation of distinct in Window Aggregates
Previous Message Thomas Munro 2023-03-12 07:18:30 Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED