Re: grouping/clustering query

From: "Tony Wasson" <ajwasson(at)gmail(dot)com>
To: "Steve Midgley" <science(at)misuse(dot)org>
Cc: Joe <dev(at)freedomcircle(dot)net>, pgsql-sql(at)postgresql(dot)org, "David Garamond" <davidgaramond(at)gmail(dot)com>
Subject: Re: grouping/clustering query
Date: 2008-10-24 20:48:51
Message-ID: 6d8daee30810241348x4a5d4f22le5622be56a1ade27@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Fri, Oct 24, 2008 at 10:24 AM, Tony Wasson <ajwasson(at)gmail(dot)com> wrote:
> On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science(at)misuse(dot)org> wrote:
>> At 11:28 AM 10/23/2008, Joe wrote:
>>>
>>> Steve Midgley wrote:
>>>>>
>>>>> # (invoiceid, txid)
>>>>> (A, 1)
>>>>> (A, 3)
>>>>> (B, 1)
>>>>> (B, 2)
>>>>> (C, 5)
>>>>> (D, 6)
>>>>> (D, 7)
>>>>> (E, 8)
>>>>> (F, 8)
>>>>>
>>>>> For journalling, I need to group/cluster this together. Is there a SQL
>>>>> query that can generate this output:
>>>>>
>>>>> # (journal: invoiceids, txids)
>>>>> [A,B] , [1,2,3]
>>>>> [C], [5]
>>>>> [D], [6,7]
>>>>> [E,F], [8]
>>>>
>>>> Hi Dave,
>>>>
>>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>>> the first line print:
>>>>
>>>>> [A,B] , [1,2,3]
>>>>
>>>> What's the rule that tells the query to output this way? Is it that all
>>>> of B's values are between A's values?
>>>
>>> From a purely accounting standpoint, since transaction 1 was applied to
>>> both invoices A and B, you need to group the invoices so that you can
>>> compare total invoiced against total paid.
>>
>> I tinkered around briefly but didn't come up with a good idea, but I bet
>> someone on this list can. However, I did create a CREATE script for your
>> table design which, in my experience, makes it more likely that a real
>> expert will take on your problem..
>>
>> Hope this helps,
>>
>> Steve
>>
>> DROP TABLE IF EXISTS trans;
>>
>> CREATE TABLE trans
>> (
>> id serial NOT NULL,
>> inv_id character varying,
>> tx_id character varying,
>> CONSTRAINT pk_id PRIMARY KEY (id)
>> )
>> WITH (OIDS=FALSE);
>>
>> insert into trans (inv_id, tx_id) values('A','1');
>> insert into trans (inv_id, tx_id) values('A','3');
>> insert into trans (inv_id, tx_id) values('B','1');
>> insert into trans (inv_id, tx_id) values('B','2');
>> insert into trans (inv_id, tx_id) values('C','5');
>> insert into trans (inv_id, tx_id) values('D','6');
>> insert into trans (inv_id, tx_id) values('D','7');
>> insert into trans (inv_id, tx_id) values('E','8');
>> insert into trans (inv_id, tx_id) values('F','8');

Here's a stab at a custom aggregate attempting to explain what's going
on inside.

CREATE OR REPLACE FUNCTION varchar_array_accum_unique (x VARCHAR[], y
VARCHAR) RETURNS VARCHAR[] AS $$
DECLARE
res VARCHAR[];
BEGIN
RAISE NOTICE 'input state is %',x;
RAISE NOTICE 'input variable is %',y;
IF x = '{}' THEN
RAISE NOTICE 'x is empty, returning input variable %',y;
res[1] := y;
ELSE
RAISE NOTICE 'input array is not empty, checking if input
variable is a member %',y;
res := x;
IF y = ANY(res) THEN
RAISE NOTICE 'y is already in array %, skipping',res;
ELSE
res := array_append(res, y);
RAISE NOTICE 'appending input variable %',y;
END IF;
END IF;
RETURN res;
END
$$ LANGUAGE plpgsql STRICT;

DROP AGGREGATE array_accum_unique(VARCHAR);
CREATE AGGREGATE array_accum_unique
(
basetype = VARCHAR
, sfunc = varchar_array_accum_unique
, stype = VARCHAR[]
, initcond = '{}'
);

SELECT array_accum_unique(inv_id) AS invoiceids
, array_accum_unique(tx_id) AS transactionids
FROM (
SELECT tx_id, inv_id
FROM trans
WHERE inv_id IN (
SELECT inv_id
FROM trans
WHERE id IN (
SELECT id FROM trans
WHERE tx_id=1
)
)
ORDER BY tx_id, inv_id
) AS ss
;

Returns this result for transaction id 1.

invoiceids | transactionids
------------+----------------
{A,B} | {1,2,3}
(1 row)

Hope this helps!
Tony

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David Garamond 2008-10-25 01:50:11 Re: grouping/clustering query
Previous Message Tony Wasson 2008-10-24 17:24:55 Re: grouping/clustering query