2D array aggregation performance (array_agg for arrays)

From: Dennis Runz <d(dot)runz(at)stud(dot)uni-heidelberg(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: 2D array aggregation performance (array_agg for arrays)
Date: 2012-01-16 17:31:34
Message-ID: CALB1XpLXq9tKvGPS158KQ7c6rXwjgr8_56G8hCCbN3kU8h=xXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello Community,

I am working on a database extension for PostgreSQL (8.4+) to support
functions for spectral graph theory of spatial/geometric graphs like
proteins. For this purpose we need to store and use huge multidimensional
arrays in the database (adjacency matrix for graph).

The performance critical function here is the aggregation of
one-dimensional arrays into two-dimensional arrays,
e.g. {1,2} and {3,4} => {{1,2},{3,4}}, respectively a set of arrays into an
array of arrays.

The array_agg function performs well, but only supports aggregation of
element types into arrays. For performance reasons, we need a similar
function that is able to aggregate arrays as shown above. Other functions
like array_cat reallocate the arrays after each aggregation step which
doesn't scale.

Now I am trying to implement array_agg for array of array aggregation using
array_agg_transfn (-> hd_array_transfn) and array_agg_finalfn (->
hd_array_finalfn) from Postgres 9.1 sources as a starting point.

This is what the current code looks like:
https://gist.github.com/5b2b60a939bec8410382
I assume it is not sufficient to simply adapt the finalfunction to create a
2D array? I tried this but Postgres crashes in:

(gdb) bt
#0 pg_detoast_datum (datum=0x0) at fmgr.c:2233
#1 0x00ab9303 in construct_md_array (elems=0x220ffbb0, nulls=0x220ffcb8
"", ndims=2, dims=0xbf84c694, lbs=0xbf84c69c, elmtype=1007, elmlen=-1,
elmbyval=0 '\000', elmalign=105 'i') at arrayfuncs.c:2936
#2 0x00ac0052 in makeMdArrayResult (astate=0x220ffb88, ndims=2,
dims=0xbf84c694, lbs=0xbf84c69c, rcontext=0x220d8aa8, release=0 '\000') at
arrayfuncs.c:4665
#3 0x0056c9d1 in hd_array_finalfn () from
/usr/lib/postgresql/9.1/lib/hd_array.so
#4 0x009c4ffa in finalize_aggregate (aggstate=<optimized out>,
peraggstate=0x220f9d58, pergroupstate=0x220f9e60, resultVal=0x220f9d38,
resultIsNull=0x220f9d48 "") at nodeAgg.c:758
# ...

I am a novice to Postgres internals and Postgres programming and would
greatly appreciate if anyone could help me with this implementation problem.

We are using PostgreSQL 9.1, but the aggregate should also run on 8.4 at
the end.

Best Regards,
Dennis

Browse pgsql-general by date

  From Date Subject
Next Message salah jubeh 2012-01-16 17:44:16 Re: psql - TYPE DEFINITION
Previous Message Tomas Vondra 2012-01-16 16:31:48 Re: Getting all entries in a single block with ctid