Re: Removing redundant itemsets

From: Allan Kamau <allank(at)sanbi(dot)ac(dot)za>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Removing redundant itemsets
Date: 2008-03-31 09:58:50
Message-ID: 47F0B5DA.7020101@sanbi.ac.za
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Craig,
Thank you so much for the solution. I have spent many hours since
Thursday last week including the weekend (and it took you just a few
minutes) trying to figure out a solution not involving procedural
programming and looping (as the size of the items and even the number of
"purchases" in the datasets I may be working with may be large), I was
looking for a solution that may take (almost) polynomial time (and
resources) and also make use of Postgresql refined and efficient engine.
Your solution satisfies these requirements. Thanks.

Allan.

Craig Ringer wrote:
> Allan Kamau wrote:
>
>> Hi all,
>> I have a list of purchases (market basket) and I would like to select
>> non redundant longest possible patterns by eliminating
>> (creating/populating other table to contain only non redandant itemsets)
>> purchases having item lists which are fully included in at least one
>> other purchase.
>>
>
> Here's a possibly slow and surely ugly solution (I think it's right,
> though I haven't done more than passing testing):
>
>
>
> CREATE VIEW togo_as_arr AS
> SELECT a.tid,
> ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item)
> AS items
> FROM togo a GROUP BY tid;
>
> SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by
> FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b
> WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items;
>
>
>
> (the view isn't necessary, but does improve the readability of the query).
>
> It groups the purchases up with item lists as arrays, then finds any
> purchases with items arrays wholly contained by other item arrays from
> other purchases.
>
> I'm *sure* there's a smarter way to do this that avoids the use of
> arrays, but I don't seem to be able to come up with one right now. It's
> interesting, though, so I might keep fiddling.
>
> --
> Craig Ringer
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Craig Ringer 2008-03-31 10:53:28 Re: Removing redundant itemsets
Previous Message Allan Kamau 2008-03-31 08:16:17 Removing redundant itemsets