Skip site navigation (1) Skip section navigation (2)

Re: Removing redundant itemsets

From: Allan Kamau <allank(at)sanbi(dot)ac(dot)za>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Removing redundant itemsets
Date: 2008-03-31 09:58:50
Message-ID: 47F0B5DA.7020101@sanbi.ac.za (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-sql
Craig,
Thank you so much for the solution. I have spent many hours since 
Thursday last week including the weekend (and it took you just a few 
minutes) trying to figure out a solution not involving procedural 
programming and looping (as the size of the items and even the number of 
"purchases" in the datasets I may be working with may be large), I was 
looking for a solution that may take (almost) polynomial time (and 
resources) and also make use of Postgresql refined and efficient engine. 
Your solution satisfies these requirements. Thanks.

Allan.

Craig Ringer wrote:
> Allan Kamau wrote:
>   
>> Hi all,
>> I have a list of purchases (market basket) and I would like to select
>> non redundant longest possible patterns by eliminating
>> (creating/populating other table to contain only non redandant itemsets)
>> purchases having item lists which are fully included in at least one
>> other purchase.
>>     
>
> Here's a possibly slow and surely ugly solution (I think it's right,
> though I haven't done more than passing testing):
>
>
>
> CREATE VIEW togo_as_arr AS
>   SELECT a.tid,
>     ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item)
>     AS items
>   FROM togo a GROUP BY tid;
>
> SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by
> FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b
> WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items;
>
>
>
> (the view isn't necessary, but does improve the readability of the query).
>
> It groups the purchases up with item lists as arrays, then finds any
> purchases with items arrays wholly contained by other item arrays from
> other purchases.
>
> I'm *sure* there's a smarter way to do this that avoids the use of
> arrays, but I don't seem to be able to come up with one right now. It's
> interesting, though, so I might keep fiddling.
>
> --
> Craig Ringer
>   


In response to

pgsql-sql by date

Next:From: Craig RingerDate: 2008-03-31 10:53:28
Subject: Re: Removing redundant itemsets
Previous:From: Allan KamauDate: 2008-03-31 08:16:17
Subject: Removing redundant itemsets

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group