Re: Review Report: propose to include 3 new functions into intarray and intagg

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: dmitry(at)koterov(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Review Report: propose to include 3 new functions into intarray and intagg
Date: 2008-09-15 11:52:11
Message-ID: 48CE4C6B.6050908@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Markus Wanner wrote:
> Dmitry Koterov wrote:
>> But, what about intarray patch? Does somebody plan to review it? I'd
>> prefer to include it too. If you approve, I'll correct the code style
>> in intarray contrib patch too.
>
> I've already volunteered for reviewing it as well. I just felt like
> splitting things up...

Looking at the intarray patch:

I find it a bit unfriendly to have a function that depends on sorted
input, but doesn't check it. But that's probably not a good enough
reason to reject an otherwise simple and useful function. Also, we
already have uniq, which doesn't strictly speaking require sorted input,
but it does if you want to eliminate all duplicates from the array.

_int_group_count_sort seems a bit special purpose. Why does it bother to
sort the output? That's wasted time if you don't need sorted output, or
if you want the array sorted by the integer value instead of frequency.
If you want sorted output, you can just sort it afterwards.

Also, it's requiring sorted input for a small performance gain, but
there's a lot more precedence in the existing intarray functions to not
require sorted input, but to sort the input instead (union, intersect,
same, overlap).

I realize that the current implementation is faster for the use case
where the input is sorted, and output needs to be sorted, but if we go
down that path we'll soon have dozens of different variants of various
functions, with different ordering requirements of inputs and outputs.

So, I'd suggest changing _int_group_count_sort so that it doesn't
require sorted input, and doesn't sort the output. The binary search
function looks good to me (I think I'd prefer naming it bsearch(),
though, though I can see that it was named bidx in reference to the
existing idx function). Also, as Markus pointed out, the SGML docs need
to be updated.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-09-15 12:09:45 Re: no XLOG during COPY?
Previous Message Simon Riggs 2008-09-15 11:04:01 Re: Transaction Snapshots and Hot Standby