Re: generic crosstab ?

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Andreas <maps(dot)on(at)gmx(dot)net>
Cc: Joe Conway <mail(at)joeconway(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: generic crosstab ?
Date: 2012-04-25 05:07:09
Message-ID: CAEV0TzB8bD8Wnqdtp89=p2kFxB0Xnc1r-9ekjO6oWvuXxfwX6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Tue, Apr 24, 2012 at 3:37 PM, Andreas <maps(dot)on(at)gmx(dot)net> wrote:

> Am 25.04.2012 00:04, schrieb Joe Conway:
>
> On 04/24/2012 02:42 PM, David Johnston wrote:
>>
>>> You must specify the output record structure:
>>>
>>> SELECT crosstab(text, text) AS ( col1_name col1_type [, colN_name
>>> colN_type]* )
>>>
>>> Whether this relates to the “materialization node” message you are
>>> receiving I have no idea.
>>>
>> The error is because you are selecting from a set returning function in
>> the target list rather than the from clause. It should be more like:
>>
>> SELECT * FROM crosstab(text, text) AS ( col1_name col1_type [, colN_name
>>
>>> colN_type]* )
>>>
>>
>>
> OK now i get at least some result.
> But
> 1) I need to know how many categories will apear to construct the target
> list. This is a wee bit of a problem as this number is actually dynamic.
>
> 2) There are some rows in the resulting list with empty columns within the
> row.
> When I execute the first query for a parent ID that has gaps in the
> crosstab I see it shows no gaps in the categories when called outside
> crosstab().
> E.g. it dumps x1, x2, x3, x4, x5 when called seperately but crosstab()
> shows
> x1, x2, null, null, x5, null, x6, x7
>
> How does this make sense ?
>

I believe that the crosstab function is fairly naive about constructing the
crosstab and requires that each row in the first query be in row-order for
the crosstab output. It is possible that it even requires the rows in the
first query to be in column order in the crosstba output.

In other words, if the first query returns results like this:

row1, col3, value1
row1, col1, value2
row2, col1, value3
row2, col3, value4
row1, col2, value5

I believe that the last row1 entry will be dropped, or else maybe you'll
get two rows in the crosstab output, each representing row1, and I am
hypothesizing (without re-reading the documentation) that the fact that
col3 comes before col1 in row1 will also result in a hole in the output.

Basically, your first query should include "order by 1,2" rather than just
"order by 1"

Have you gone through the documentation for the crosstab functions? I
believe that will answer most of your questions. It has been a long time
since I've read them, but I remember them being fairly complete.

As for your question about knowing how many columns you will have, I've
always solved that by opening a serialized transaction and running a query
to get the possible set of columns, then issue the sql statement that
includes the crosstab(sql,sql) call, using the first query results to build
up the return result type. Depending on the nature of your data set, it is
possible you can forgo the serialized transaction, since there may be
little to no risk of new column names appearing between the two queries.
You could also just structure the queries in the second statement so that
it is guaranteed to return only the results returned by your initial query
- include a where clause which is guaranteed to return only rows that match
the columns returned by your first query. You may wind up missing a value
that was added between the first query and the second query, but odds are
good that if your data is that volatile, it doesn't matter if you miss a
value which happened to be inserted in those few milliseconds.

--sam

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Venkateswara Rao Dokku 2012-04-25 05:55:22 Regarding report generated by the pgFouine
Previous Message Trinath Somanchi 2012-04-25 03:46:44 Re: How to group by similarity?