Re: Questions regarding distinct operation implementation

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>, pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Questions regarding distinct operation implementation
Date: 2022-12-01 21:51:58
Message-ID: CAKFQuwbd-+Mo7PwXYNJytBVRUe6dDGru41wjXf_fGgiYaxC==g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 1, 2022 at 2:37 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

>
> The question is, what do you want to make work? If you're not worried
> about supporting DISTINCT when there is an ORDER BY clause and the
> frame options are effectively ROWS BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING, then it's going to be much easier to make work.
> You never need to worry about rows dropping out of visibility in the
> frame. Simply all rows in the partition are in the frame.
>

I would definitely want the ability to have the output ordered and distinct
at the same time.

array_agg(distinct col) over (order by whatever)

Conceptually this seems like it can be trivially accomplished with a simple
lookup table, the key being the distinct column(s) and the value being a
counter - with the entry being removed when the counter goes to zero
(decreases happening each time a row goes out of scope). The main concern,
I suspect, isn't implementation ability, it is speed and memory consumption.

I would expect the distinct output to be identical to the non-distinct
output except for duplicates removed. Using array_agg as an example makes
seeing the distinction quite easy.

Thinking over the above a bit more, is something like this possible?

array_agg(distinct col order by col) over (order by whatever)

i.e., can we add order by within the aggregate to control its internal
ordering separately from the ordering needed for the window framing?

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-12-01 22:33:45 Re: Error-safe user functions
Previous Message David G. Johnston 2022-12-01 21:40:48 Re: Allow round() function to accept float and double precision