Re: How to join table to itself N times?

From: Misa Simic <misa(dot)simic(at)gmail(dot)com>
To: "W(dot) Matthew Wilson" <matt(at)tplus1(dot)com>
Cc: pgsql <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to join table to itself N times?
Date: 2013-03-22 20:19:43
Message-ID: CAH3i69myQ+LViJscva00rhexm59Zq=jtBsXf3deN1R+s0BmR5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

correction:

WITH RECURSIVE t (

SELECT array_agg('{}'::text[], value) AS values, ord + 1 AS next_dim_ord,
ord AS agg_dims
FROM market_segment_dimension_values
INNER JOIN market_segment_dimensions USING (market_segment_dimension)
WHERE ord = 1
UNION ALL
SELECT array_agg(t.values, value) AS values, ord + 1 AS next_dim_ord, ord
AS agg_dims
FROM t
INNER JOIN market_segment_dimensions ON (ord = t.next_dim_ord)
INNER JOIN market_segment_dimension_values USING (market_segment_dimension)
)

SELECT values FROM t WHERE t.agg_dims = (SELECT MAX(ord)
FROM market_segment_dimensions)

2013/3/22 Misa Simic <misa(dot)simic(at)gmail(dot)com>

> correction:
>
>
> 2013/3/22 Misa Simic <misa(dot)simic(at)gmail(dot)com>
>
>> Hi,
>>
>> Not clear what is expected result - if you add new dimension...
>>
>> a) three columns? - well not possible to write SQL query which returns
>> undefined number of columns... unfortunatelly - though I am not clear why :)
>>
>> b) But you can get the similar result as from python... my guess is you
>> expect:
>>
>>
>>
>> ('north', 'retail', small),
>> ('north', 'retail', big),
>> ('north', 'manufacturing', small),
>> ('north', 'manufacturing', big),
>> ('north', 'wholesale', small),
>> ('north', 'wholesale', big),
>> ('south', 'retail', small),
>> ('south', 'retail', big),
>> ('south', 'manufacturing', small),
>> ('south', 'manufacturing', big)
>> ('south', 'wholesale', small)
>> ('south', 'wholesale', big)
>>
>>
>>
>> In your dimensions table (called: market_dimensions) you would need one
>> more column to define desired result order
>>
>> i.e.
>>
>> market_segment_dimensions
>> market_segment_dimension , ord
>> geography, 1
>> industry type, 2
>> customer size, 3
>>
>>
>> WITH RECURSIVE t (
>>
>> SELECT array_agg(value) AS values, ord + 1 AS next_dim_ord, ord AS
>> agg_dims
>> FROM market_segment_dimension_values
>> INNER JOIN market_segment_dimensions USING (market_segment_dimension)
>> WHERE ord = 1
>> UNION ALL
>> SELECT array_agg(value) AS values, ord + 1 AS next_dim_ord, ord AS
>> agg_dims
>> FROM t
>> INNER JOIN market_segment_dimensions ON (ord = t.next_dim_ord)
>> INNER JOIN market_segment_dimension_values USING (
>> market_segment_dimension)
>> )
>>
>> SELECT values FROM t WHERE t.agg_dims = (SELECT MAX(ord) FROM
>> market_segment_dimensions)
>>
>>
>>
>>
>>
>>
>> 2013/3/21 W. Matthew Wilson <matt(at)tplus1(dot)com>
>>
>> I got this table right now:
>>>
>>> select * from market_segment_dimension_values ;
>>> +--------------------------+---------------+
>>> | market_segment_dimension | value |
>>> +--------------------------+---------------+
>>> | geography | north |
>>> | geography | south |
>>> | industry type | retail |
>>> | industry type | manufacturing |
>>> | industry type | wholesale |
>>> +--------------------------+---------------+
>>> (5 rows)
>>>
>>> The PK is (market_segment_dimension, value).
>>>
>>> The dimension column refers to another table called
>>> market_segment_dimensions.
>>>
>>> So, "north" and "south" are to values for the "geography" dimension.
>>>
>>> In that data above, there are two dimensions. But sometimes there could
>>> be just one dimension, or maybe three, ... up to ten.
>>>
>>> Now here's the part where I'm stumped.
>>>
>>> I need to create a cartesian product of the dimensions.
>>>
>>> I came up with this approach by hard-coding the different dimensions:
>>>
>>> with geog as (
>>> select value
>>> from market_segment_dimension_values
>>> where market_segment_dimension = 'geography'),
>>>
>>> industry_type as (
>>> select value
>>> from market_segment_dimension_values
>>> where market_segment_dimension = 'industry type')
>>>
>>> select geog.value as g,
>>> industry_type.value as ind_type
>>> from geog
>>> cross join industry_type
>>> ;
>>> +-------+---------------+
>>> | g | ind_type |
>>> +-------+---------------+
>>> | north | retail |
>>> | north | manufacturing |
>>> | north | wholesale |
>>> | south | retail |
>>> | south | manufacturing |
>>> | south | wholesale |
>>> +-------+---------------+
>>> (6 rows)
>>>
>>> But that won't work if I add a new dimension (unless I update the
>>> query). For example, maybe I need to add a new dimension called, say,
>>> customer size, which has values "big" and "small". A
>>>
>>> I've got some nasty plan B solutions, but I want to know if there's some
>>> solution.
>>>
>>> There's a really elegant solution in python using itertools.product,
>>> like this:
>>>
>>> >>> list(itertools.product(*[['north', 'south'], ['retail',
>>> 'manufacturing', 'wholesale']]))
>>>
>>> [('north', 'retail'),
>>> ('north', 'manufacturing'),
>>> ('north', 'wholesale'),
>>> ('south', 'retail'),
>>> ('south', 'manufacturing'),
>>> ('south', 'wholesale')]
>>>
>>> All advice is welcome. Thanks in advance!
>>>
>>> Matt
>>>
>>>
>>>
>>> --
>>> W. Matthew Wilson
>>> matt(at)tplus1(dot)com
>>> http://tplus1.com
>>>
>>>
>>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ryan Kelly 2013-03-22 20:57:14 Re: PostgreSQL EXCLUDE USING error: Data type integer has no default operator class
Previous Message Misa Simic 2013-03-22 20:18:18 Re: How to join table to itself N times?