Re: intarray internals

From: Volkan YAZICI <yazicivo(at)ttnet(dot)net(dot)tr>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: intarray internals
Date: 2006-05-07 10:21:36
Message-ID: 20060507102136.GA202@alamut
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi,

I've prepared a Quick & Dirty patch serie for some missing parts in
intarray contrib module. Here they go with some explanation...

On May 06 12:38, Volkan YAZICI wrote:
> [4]
> In the inner_int_contains() function of _int_tool.c, there's a while
> loop like
>
> [Code assumes that arrays are sorted.]
>
> na = ARRNELEMS(a);
> nb = ARRNELEMS(b);
> da = ARRPTR(a);
> db = ARRPTR(b);
>
> i = j = n = 0;
> while (i < na && j < nb)
> if (da[i] < db[j])
> i++;
> else if (da[i] == db[j])
> {
> n++;
> i++;
> j++;
> }
> else
> j++;
>
> return (n == nb) ? TRUE : FALSE;
>
> AFAICS, last "j++" should be replaced with "return FALSE". Because, "n"
> cannot be equal to "nb" no more, if "j" gets incremented without
> incrementing "n" (remember "j < nb" in the "while" condition).

intarray_contains.patch.0 is for above problem.

[5]
ISTM, inner_int_union() of _int_tool.c, makes some redundant visits to
array:

...
/* union */
i = j = 0;
while (i < na && j < nb)
if (da[i] < db[j])
*dr++ = da[i++];
else
*dr++ = db[j++];

while (i < na)
*dr++ = da[i++];
while (j < nb)
*dr++ = db[j++];

}

if (ARRNELEMS(r) > 1)
r = _int_unique(r);

IMHO, uniting unique values (instead of uniting and then removing
duplicates) should work faster. intarray_union.patch.0 is for this
problem. (Patched code, handles uniting for unique values.)

[6]
There's a seperate sorting algorithm isort() in _int_tool.c. Looks like
it executes some kind of shell sort on the array and returns true if
array has any duplicates. It's used for common sorting and deciding on
executing _int_unique() on the related array if isort() says it has
duplicates.

IIRC, our inner qsort() has a smaller O(N) degree when compared to above
sorting algorithm. Also for the code's sake it'd be better to switch
using qsort() in all sorting related stuff. For these reasons,
intarray_sort.patch.0 addresses this (probable) gotcha.

All 3 patches passed regression tests for intarray contrib module. But
these are just for addressing some gotchas I found while reading code.
Your comments for these problems(?) are welcome.

Regards.

Attachment Content-Type Size
intarray_contains.patch.0 text/plain 546 bytes
intarray_sort.patch.0 text/plain 2.3 KB
intarray_union.patch.0 text/plain 2.3 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message P.Mo 2006-05-07 18:07:39 Winning
Previous Message Lincoln Yeoh 2006-05-07 09:48:32 Re: Can't Figure Out Where Rows Are Going

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2006-05-07 12:03:42 Re: [pgsql-hackers-win32] Build with Visual Studio & MSVC
Previous Message Gurjeet Singh 2006-05-07 10:15:23 Fwd: [pgsql-hackers-win32] Build with Visual Studio & MSVC