Review Report: propose to include 3 new functions into intarray and intagg

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: dmitry(at)koterov(dot)ru
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Review Report: propose to include 3 new functions into intarray and intagg
Date: 2008-09-06 11:34:10
Message-ID: 48C26AB2.5050009@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

this is my first "official" review. I've tried to follow the "Review a
patch" guidelines from the wiki - thanks Simon, that was pretty helpful.

This review covers only the intagg additions.

Dmitry Koterov wrote:
> Here are these functions with detailed documentation:
> http://en.dklab.ru/lib/dklab_postgresql_patch/

Submission review: we generally prefer having patches archived on our
mailing lists, so please just send future patches or revisions of this
patch to our lists (I prefer -hackers, but probably -patches is still
the official one). The documentation should go into a README file of the
contrib module or some such. Tests are missing, but that's the case for
the intagg contrib module anyway. The patch applies and is in context
diff format, good. It adds an unrelated newline, which should generally
be avoided.

Usability review: functional additions look good and usable, certainly
within a contrib module. I don't think it's compliant to any spec, but
that's not required for contrib, IMO.

Functional test: works as advertised on the accompanying website. I've
tested the int_array_append_aggregate somewhat, see [1].

Performance testing: I've tested with 10 mio rows of arrays of size 1
and compared against core's int_array_aggregate function, see [1] again.
In this simple test it took roughly 50% longer, which seems okay. Memory
consumption looks sane as well.

Coding review: style seems fine for contrib, though lines longer than 80
cols should be broken up. Comments in the code are sparse , some even
counter-productive (marking something as "additional things" certainly
doesn't help).

Code and architecture review: the new int_array_append_aggregate()
functions itself seems fine to me.

Summary: My general feeling is, that this patch should be applied after
minor code style corrections. As a longer term goal I think intagg
should be integrated into core, since it's very basic functionality.
TODO entries for things like an array_accum() aggregate already exist.
Adding this patch to contrib now might be a step into the right direction.

Dmitry, can you please apply these small corrections and re-submit the
patch?

Regards

Markus Wanner

P.S.: I dislike the intagg's use of PGARRAY, but that's nothing to do
with this patch. Shouldn't this better use a real composite type as the
aggregate's state type? I'd propose to clean up the intagg contrib
module and prepare it for inclusion into core.

[1]: functional and performance testing session:

On a database with (patched) intagg and intarr contrib modules:

markus=# CREATE TABLE test (id INT NOT NULL, arr INT[] NOT NULL);
CREATE TABLE
markus=# INSERT INTO test VALUES (1, ARRAY[1,2,3]), (2, ARRAY[4,5]), (3,
ARRAY[3,2,1]);
INSERT 0 3
markus=# SELECT * FROM test;
id | arr
----+---------
1 | {1,2,3}
2 | {4,5}
3 | {3,2,1}
(3 rows)

markus=# SELECT int_array_append_aggregate(arr) FROM test;
int_array_append_aggregate
----------------------------
{1,2,3,4,5,3,2,1}
(1 row)

markus=# SELECT * FROM test;
id | arr
----+---------
1 | {1,2,3}
2 | {4,5}
3 | {3,2,1}
(3 rows)

markus=# SELECT int_array_aggregate(id) AS ids,
int_array_append_aggregate(arr) FROM test GROUP BY (id / 2);
ids | int_array_append_aggregate
-------+----------------------------
{1} | {1,2,3}
{2,3} | {4,5,3,2,1}
(2 rows)

markus=# SELECT int_array_aggregate(id) AS ids,
int_array_append_aggregate(arr) FROM test GROUP BY (id % 2);
ids | int_array_append_aggregate
-------+----------------------------
{2} | {4,5}
{1,3} | {1,2,3,3,2,1}
(2 rows)

markus=# INSERT INTO test VALUES (4, NULL);
INSERT 0 1

markus=# SELECT int_array_append_aggregate(arr) FROM test;
int_array_append_aggregate
----------------------------
{1,2,3,4,5,3,2,1}
(1 row)

markus=# SELECT id, int_array_append_aggregate(arr) FROM test GROUP BY id;
id | int_array_append_aggregate
----+----------------------------
4 | {}
2 | {4,5}
3 | {3,2,1}
1 | {1,2,3}
(4 rows)

-- switching to performance testing

markus=# \timing
Timing is on.

markus=# DELETE FROM test;
DELETE 4
Time: 9.037 ms

markus=# INSERT INTO test SELECT generate_series(1, 10000000),
array[round(random() * 100)]::int[];
INSERT 0 10000000
Time: 53321.186 ms

markus=# SELECT icount(int_array_aggregate(id)) AS count FROM test;
count
----------
10000000
(1 row)

Time: 2493.184 ms

markus=# SELECT icount(int_array_append_aggregate(arr)) AS count FROM test;
count
----------
10000000
(1 row)

Time: 4152.478 ms

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-09-06 12:06:04 Re: pgsql: Fix an oversight in the 8.2 patch that improved mergejoin
Previous Message Greg Smith 2008-09-06 08:58:00 Re: Prototype: In-place upgrade v02