Re: grouping/clustering query

From: "Tony Wasson" <ajwasson(at)gmail(dot)com>
To: "Steve Midgley" <science(at)misuse(dot)org>
Cc: Joe <dev(at)freedomcircle(dot)net>, pgsql-sql(at)postgresql(dot)org, "David Garamond" <davidgaramond(at)gmail(dot)com>
Subject: Re: grouping/clustering query
Date: 2008-10-24 17:24:55
Message-ID: 6d8daee30810241024i661c73eag6edc20ab051da09b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Fri, Oct 24, 2008 at 8:04 AM, Steve Midgley <science(at)misuse(dot)org> wrote:
> At 11:28 AM 10/23/2008, Joe wrote:
>>
>> Steve Midgley wrote:
>>>>
>>>> # (invoiceid, txid)
>>>> (A, 1)
>>>> (A, 3)
>>>> (B, 1)
>>>> (B, 2)
>>>> (C, 5)
>>>> (D, 6)
>>>> (D, 7)
>>>> (E, 8)
>>>> (F, 8)
>>>>
>>>> For journalling, I need to group/cluster this together. Is there a SQL
>>>> query that can generate this output:
>>>>
>>>> # (journal: invoiceids, txids)
>>>> [A,B] , [1,2,3]
>>>> [C], [5]
>>>> [D], [6,7]
>>>> [E,F], [8]
>>>
>>> Hi Dave,
>>>
>>> I'm not following the logic here. A has 1,3 and B has 1,2. So why does
>>> the first line print:
>>>
>>>> [A,B] , [1,2,3]
>>>
>>> What's the rule that tells the query to output this way? Is it that all
>>> of B's values are between A's values?
>>
>> From a purely accounting standpoint, since transaction 1 was applied to
>> both invoices A and B, you need to group the invoices so that you can
>> compare total invoiced against total paid.
>
> I tinkered around briefly but didn't come up with a good idea, but I bet
> someone on this list can. However, I did create a CREATE script for your
> table design which, in my experience, makes it more likely that a real
> expert will take on your problem..
>
> Hope this helps,
>
> Steve
>
> DROP TABLE IF EXISTS trans;
>
> CREATE TABLE trans
> (
> id serial NOT NULL,
> inv_id character varying,
> tx_id character varying,
> CONSTRAINT pk_id PRIMARY KEY (id)
> )
> WITH (OIDS=FALSE);
>
> insert into trans (inv_id, tx_id) values('A','1');
> insert into trans (inv_id, tx_id) values('A','3');
> insert into trans (inv_id, tx_id) values('B','1');
> insert into trans (inv_id, tx_id) values('B','2');
> insert into trans (inv_id, tx_id) values('C','5');
> insert into trans (inv_id, tx_id) values('D','6');
> insert into trans (inv_id, tx_id) values('D','7');
> insert into trans (inv_id, tx_id) values('E','8');
> insert into trans (inv_id, tx_id) values('F','8');

This is as close as I can get the data. I think I'd need a custom
array grouping aggregate to get the results to match completely.
Notice how ABC are on their own lines?

test=# SELECT inv_array, tx_array
FROM (
SELECT tx_id, array_accum(inv_id) AS inv_array
FROM trans
GROUP BY tx_id
ORDER BY tx_id
) AS t
JOIN (
SELECT inv_id, array_accum(tx_id) AS tx_array
FROM trans
GROUP BY inv_id
ORDER BY inv_id
) AS i ON (t.tx_id = ANY(i.tx_array) OR i.inv_id =ANY(t.inv_array))
GROUP BY tx_array,inv_array
;
inv_array | tx_array
-----------+----------
{A,B} | {1,2}
{B} | {1,2}
{A} | {1,3}
{A,B} | {1,3}
{C} | {5}
{D} | {6,7}
{E,F} | {8}
(7 rows)

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tony Wasson 2008-10-24 20:48:51 Re: grouping/clustering query
Previous Message Oliveiros Cristina 2008-10-24 15:20:02 Re: grouping/clustering query