Re: Questions regarding distinct operation implementation

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Questions regarding distinct operation implementation
Date: 2022-11-24 17:57:11
Message-ID: 60471090-3d24-565f-cdbb-5150a8895780@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 23/11/22 23:48, Ankit Kumar Pandey wrote:
>
> Hello,
>
>
> I have questions regarding distinct operation and would be glad if
> someone could help me out.
>
> Consider the following table (mytable):
>
> id, name
>
> 1, A
>
> 1, A
>
> 2, B
>
> 3, A
>
> 1, A
>
> If we do /select avg(id) over (partition by name) from mytable/,
> partition logic goes like this:
>
> for A: 1, 1, 3, 1
>
> If we want to implement something like this /select avg(distinct id)
> over (partition by name) from m/ytable
>
> and remove duplicate by storing last datum of aggregate column (id)
> and comparing it with current value. It fails here because aggregate
> column is not sorted within the partition.
>
> Questions:
>
> 1. Is sorting prerequisite for finding distinct values?
>
> 2. Is it okay to sort aggregate column (within partition) for distinct
> to work in case of window function?
>
> 3. Is an alternative way exists to handle this scenario (because here
> sort is of no use in aggregation)?
>
>
> Thanks
>
>
> --
> Regards,
> Ankit Kumar Pandey

Hi,

After little more digging, I can see that aggregation on Window
functions are of running type, it would be bit more effective if a
lookup hashtable is created where every value in current aggregate
column get inserted. Whenever frame moves ahead, a lookup if performed
for presence of duplicate.

On performance standpoint, this might be bad idea though.

Please let me know any opinions on this.

--
Regards,
Ankit Kumar Pandey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cary Huang 2022-11-24 18:15:39 Re: Patch: Global Unique Index
Previous Message Simon Riggs 2022-11-24 17:42:31 Re: Add 64-bit XIDs into PostgreSQL 15