Fwd: grouping/clustering query

From: "Osvaldo Kussama" <osvaldo(dot)kussama(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Fwd: grouping/clustering query
Date: 2008-10-27 19:28:09
Message-ID: 690707f60810271228t316cc37dofff3f1831a2e8c2f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I forgot the list.

---------- Forwarded message ----------
From: Osvaldo Kussama <osvaldo(dot)kussama(at)gmail(dot)com>
Date: Mon, 27 Oct 2008 12:28:57 -0200
Subject: Re: [SQL] grouping/clustering query
To: David Garamond <davidgaramond(at)gmail(dot)com>

2008/10/24, David Garamond <davidgaramond(at)gmail(dot)com>:
> Tony, Joe, Steve,
>
> Thanks for the follow-ups. Yes, the problem is related to double-entry
> accounting, where one needs to balance total debit and credit
> (payments and invoices) in each journal/transaction.
>
> Due to time constraint, I ended up doing this in the client-side
> programming language, since I am nowhere near fluent in PLs. The
> algorithm should be simple (at least the "brute force" version), it's
> basically checking if each element of the pair (txid, invoiceid) is
> already mentioned in some journal and if it is, add the pair to the
> journal, otherwise create a new journal with that pair as the first
> entry. I believe this can easily be implemented in a PL. But still I
> wonder if there is some SQL incantation that can do the same without
> any PL.
>

Interesting problem.

I think there are no SQL-only solution.

Using arrays and PL/pgSQL function:
bdteste=# SELECT * FROM bar;
aid | bid
-----+-----
A | 1
A | 3
B | 1
B | 2
C | 5
D | 6
D | 7
E | 8
F | 8
(9 registros)

bdteste=# CREATE OR REPLACE FUNCTION combina() RETURNS setof record AS $$
bdteste$# DECLARE
bdteste$# res record;
bdteste$# res1 record;
bdteste$# BEGIN
bdteste$# CREATE TEMP TABLE foobar(
bdteste$# fbaid text[],
bdteste$# fbbid int[])
bdteste$# ON COMMIT DROP;
bdteste$#
bdteste$# FOR res IN SELECT agr1, bid FROM (SELECT bid,
array_accum(aid) AS agr1 FROM bar
bdteste$# GROUP BY bid) b1 ORDER BY array_upper(agr1,
1) DESC, agr1 LOOP
bdteste$# SELECT * INTO res1 FROM foobar WHERE fbaid @> res.agr1;
bdteste$# IF NOT FOUND THEN
bdteste$# INSERT INTO foobar VALUES (res.agr1, array[res.bid]);
bdteste$# ELSE
bdteste$# UPDATE foobar SET fbbid = array_append(fbbid,
res.bid) WHERE fbaid @> res.agr1;
bdteste$# END IF;
bdteste$# END LOOP;
bdteste$#
bdteste$# RETURN QUERY SELECT * FROM foobar;
bdteste$# END;
bdteste$# $$ LANGUAGE PLPGSQL;
CREATE FUNCTION
bdteste=#
bdteste=# SELECT * FROM combina() AS(a text[], b int[]);
a | b
-------+---------
{E,F} | {8}
{A,B} | {1,3,2}
{C} | {5}
{D} | {7,6}
(4 registros)

Osvaldo

PS. - Aggregate array_accum defined at:
http://www.postgresql.org/docs/current/interactive/xaggr.html
- If you need sorted arrays use Andreas Kretschmer's function array_sort:
http://archives.postgresql.org/pgsql-general/2007-02/msg01534.php

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2008-10-27 19:46:30 Re: How to hand over array as variable in plpgsql function?
Previous Message Jan Peters 2008-10-27 16:25:37 How to hand over array as variable in plpgsql function?