Skip site navigation (1) Skip section navigation (2)

Re: Review Report: propose to include 3 new functions into intarray and intagg

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: dmitry(at)koterov(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Review Report: propose to include 3 new functions into intarray and intagg
Date: 2008-09-15 12:38:40
Message-ID: 48CE5750.6070309@bluegap.ch (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi,

sorry for not having completed this review, yet. As you are obviously 
looking at the patch as well, I'll try to quickly write down my points 
so far.

Trying to compile the intarray module, I now receive an error:

error: ‘INT4OID’ undeclared (first use in this function)

That can be solved by including "catalog/pg_type.h" from 
contrib/intarr/_int_op.c.


The PG_FUNCTION_INFO_V1 and prototype definition certainly belong to the 
top of the file, where all others are.

Some lines are longer than 80 columns and again comments are a bit 
sparse or even useless (no "additional things", please).


Heikki Linnakangas wrote:
> I find it a bit unfriendly to have a function that depends on sorted 
> input, but doesn't check it. But that's probably not a good enough 
> reason to reject an otherwise simple and useful function. Also, we 
> already have uniq, which doesn't strictly speaking require sorted input, 
> but it does if you want to eliminate all duplicates from the array.

I think it's a performance optimization which is absolutely required in 
some cases. Some time ago I've also had to rip out the sorting step from 
certain intarray module functions to save processing time.

One option already mentioned somewhere would be saving a 'sorted' 
property for the array. Then again, I think such a thing would certainly 
have to be done globally, for all kinds of arrays.

> _int_group_count_sort seems a bit special purpose. Why does it bother to 
> sort the output? That's wasted time if you don't need sorted output, or 
> if you want the array sorted by the integer value instead of frequency. 
> If you want sorted output, you can just sort it afterwards.

Agreed. IMO the normal GROUP BY and ORDER BY stuff of the database 
itself should be used for such a thing. However, that means turning an 
array into a set of rows...

> Also, it's requiring sorted input for a small performance gain, but 
> there's a lot more precedence in the existing intarray functions to not 
> require sorted input, but to sort the input instead (union, intersect, 
> same, overlap).

..and exactly these are the functions I had to wrap again to strip the 
sorting step, due to poor performance for known-sorted arrays.

> I realize that the current implementation is faster for the use case 
> where the input is sorted, and output needs to be sorted, but if we go 
> down that path we'll soon have dozens of different variants of various 
> functions, with different ordering requirements of inputs and outputs.

Agreed.

However, given the OP is using that in production, there seems to be a 
use case for the optimization, where we have none for the same function 
without it.

> So, I'd suggest changing _int_group_count_sort so that it doesn't 
> require sorted input, and doesn't sort the output. The binary search 
> function looks good to me (I think I'd prefer naming it bsearch(), 
> though, though I can see that it was named bidx in reference to the 
> existing idx function). Also, as Markus pointed out, the SGML docs need 
> to be updated.

As is, it should probably also carry the '_int' prefix, because it's not 
a general purpose array function. So propose to name it '_int_bsearch'.

Overall I think these functions are overly specialized and should be 
replaced be more general counterparts in core. However, until we have 
that, it's hard to refuse such a thing for contrib.

By that reasoning, however, the intarray would have to provide methods 
for sorted as well as asorted input arrays as well.

I'm closing my part of reviewing of this patch now. Dmitry, how do you 
want to proceed with these patches? Do you have time for some cleaning 
up and writing documentation?

Regards

Markus Wanner

In response to

Responses

pgsql-hackers by date

Next:From: Simon RiggsDate: 2008-09-15 12:40:19
Subject: Re: Transaction Snapshots and Hot Standby
Previous:From: Gregory StarkDate: 2008-09-15 12:13:04
Subject: Re: Transaction Snapshots and Hot Standby

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group