[PATCH] distinct aggregates within a window function WIP

From: Krasiyan Andreev <krasiyan(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [PATCH] distinct aggregates within a window function WIP
Date: 2020-01-13 09:17:02
Message-ID: CAN1Pwonf4waD+PWkEFK8ANLua8fPjZ4DmV+hixO62+LiR8gwaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I want to propose to you an old patch for Postgres 11, off-site developed
by Oliver Ford,
but I have permission from him to publish it and to continue it's
development,
that allow distinct aggregates, like select sum(distinct nums) within a
window function.

I have rebased it for current git master branch and have made necessary
changes to it to work with Postgres 13devel.

It's a WIP, because it doesn't have tests yet (I will add them later) and
also, it works for a int, float, and numeric types,
but probably distinct check can be rewritten for possible performance
improvement,
with storing the distinct elements in a hash table which should give a
performance improvement.

If you find the implementation of patch acceptable from committers
perspective,
I will answer to all yours design and review notes and will try to go ahead
with it,
also, I will add this patch to the March commit fest.

For example usage of a patch, if you have time series data, with current
Postgres you will get an error:

postgres=# CREATE TABLE t_demo AS
postgres-# SELECT ordinality, day, date_part('week', day) AS week
postgres-# FROM generate_series('2020-01-02', '2020-01-15', '1
day'::interval)
postgres-# WITH ORDINALITY AS day;
SELECT 14
postgres=# SELECT * FROM t_demo;
ordinality | day | week
------------+------------------------+------
1 | 2020-01-02 00:00:00+02 | 1
2 | 2020-01-03 00:00:00+02 | 1
3 | 2020-01-04 00:00:00+02 | 1
4 | 2020-01-05 00:00:00+02 | 1
5 | 2020-01-06 00:00:00+02 | 2
6 | 2020-01-07 00:00:00+02 | 2
7 | 2020-01-08 00:00:00+02 | 2
8 | 2020-01-09 00:00:00+02 | 2
9 | 2020-01-10 00:00:00+02 | 2
10 | 2020-01-11 00:00:00+02 | 2
11 | 2020-01-12 00:00:00+02 | 2
12 | 2020-01-13 00:00:00+02 | 3
13 | 2020-01-14 00:00:00+02 | 3
14 | 2020-01-15 00:00:00+02 | 3
(14 rows)

postgres=# SELECT *,
postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2
FOLLOWING)
postgres-# FROM t_demo;
ERROR: DISTINCT is not implemented for window functions
LINE 2: array_agg(DISTINCT week) OVER (ORDER BY day ROWS
^

So you will need to write something like this:

postgres=# SELECT *, (SELECT array_agg(DISTINCT unnest) FROM unnest(x)) AS
b
postgres-# FROM
postgres-# (
postgres(# SELECT *,
postgres(# array_agg(week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS x
postgres(# FROM t_demo
postgres(# ) AS a;
ordinality | day | week | x | b
------------+------------------------+------+-------------+-------
1 | 2020-01-02 00:00:00+02 | 1 | {1,1,1} | {1}
2 | 2020-01-03 00:00:00+02 | 1 | {1,1,1,1} | {1}
3 | 2020-01-04 00:00:00+02 | 1 | {1,1,1,1,2} | {1,2}
4 | 2020-01-05 00:00:00+02 | 1 | {1,1,1,2,2} | {1,2}
5 | 2020-01-06 00:00:00+02 | 2 | {1,1,2,2,2} | {1,2}
6 | 2020-01-07 00:00:00+02 | 2 | {1,2,2,2,2} | {1,2}
7 | 2020-01-08 00:00:00+02 | 2 | {2,2,2,2,2} | {2}
8 | 2020-01-09 00:00:00+02 | 2 | {2,2,2,2,2} | {2}
9 | 2020-01-10 00:00:00+02 | 2 | {2,2,2,2,2} | {2}
10 | 2020-01-11 00:00:00+02 | 2 | {2,2,2,2,3} | {2,3}
11 | 2020-01-12 00:00:00+02 | 2 | {2,2,2,3,3} | {2,3}
12 | 2020-01-13 00:00:00+02 | 3 | {2,2,3,3,3} | {2,3}
13 | 2020-01-14 00:00:00+02 | 3 | {2,3,3,3} | {2,3}
14 | 2020-01-15 00:00:00+02 | 3 | {3,3,3} | {3}
(14 rows)

With attached version, you will get the desired results:

postgres=# SELECT *,
postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2
FOLLOWING)
postgres-# FROM t_demo;
ordinality | day | week | array_agg
------------+------------------------+------+-----------
1 | 2020-01-02 00:00:00+02 | 1 | {1}
2 | 2020-01-03 00:00:00+02 | 1 | {1}
3 | 2020-01-04 00:00:00+02 | 1 | {1,2}
4 | 2020-01-05 00:00:00+02 | 1 | {1,2}
5 | 2020-01-06 00:00:00+02 | 2 | {1,2}
6 | 2020-01-07 00:00:00+02 | 2 | {1,2}
7 | 2020-01-08 00:00:00+02 | 2 | {2}
8 | 2020-01-09 00:00:00+02 | 2 | {2}
9 | 2020-01-10 00:00:00+02 | 2 | {2}
10 | 2020-01-11 00:00:00+02 | 2 | {2,3}
11 | 2020-01-12 00:00:00+02 | 2 | {2,3}
12 | 2020-01-13 00:00:00+02 | 3 | {2,3}
13 | 2020-01-14 00:00:00+02 | 3 | {2,3}
14 | 2020-01-15 00:00:00+02 | 3 | {3}
(14 rows)

Attachment Content-Type Size
pg13-distinct-window.patch text/x-patch 11.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2020-01-13 09:21:58 Re: [HACKERS] Block level parallel vacuum
Previous Message Michael Paquier 2020-01-13 08:59:38 Re: Question regarding heap_multi_insert documentation