Re: getting the ranks out of items with SHARED

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Janning Vygen <vygen(at)planwerk6(dot)de>
Cc: Janning Vygen <vygen(at)gmx(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: getting the ranks out of items with SHARED
Date: 2005-07-14 16:53:53
Message-ID: 20658.1121360033@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Janning Vygen <vygen(at)planwerk6(dot)de> writes:
> and this is the combined statement:

> UPDATE temp_gc
> SET gc_rank = ranking.rank
> FROM (

> SELECT
> *,
> ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
> FROM (
> SELECT
> mg_name,
> gc_gesamtpunkte,
> gc_gesamtsiege
> FROM temp_gc
> ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
> ) AS r1
> ) AS ranking
> WHERE temp_gc.mg_name = ranking.mg_name;

> to me it looks like the call to ranking() is moved from the subquery plan
> "upwards". but i really can't interpret this explain output regarding to
> "where" the ranking funcion is called.

Yeah, I think you are exactly right. The EXPLAIN output doesn't show
targetlist contents (EXPLAIN VERBOSE would, but not very readably :-(),
but the small difference in the estimated costs seems to indicate that
the function evaluation is not happening at the level of the "subquery
scan" plan node in the combined plan. This is consistent with the
planner's behavior in general. It will honor the "order by" in the
sense that the output of the "r1" subquery is delivered to the upper
plan level in that order, but it feels no compunction about flattening
the "ranking" subquery into the outer query, and after that you cannot
be certain about the order in which the executions of the ranking()
function happen.

What you need is to prevent the flattening of the intermediate subquery.
The current favorite technique is to insert a dummy OFFSET clause:

UPDATE temp_gc
SET gc_rank = ranking.rank
FROM (

SELECT
*,
ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
FROM (
SELECT
mg_name,
gc_gesamtpunkte,
gc_gesamtsiege
FROM temp_gc
ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
) AS r1
OFFSET 0
) AS ranking
WHERE temp_gc.mg_name = ranking.mg_name;

although you could also do it by inserting an ORDER BY at that level.

BTW, there's been some discussion of preventing flattening of subqueries
whose SELECT lists contain volatile functions. If we did that then
declaring ranking() as volatile would be enough to avoid the problem.
I've been hesitant to make the change because I'm concerned about the
probable loss of optimization in cases where the function is labeled
volatile by default, merely because the author didn't think about how
to mark it. And in any case "volatile" doesn't really describe the
issue with your function...

> my problem is that getting a rank out of items is very expensive with
> aggregate functions,

I'm unconvinced --- I don't see a reason that you can't implement it
exactly the same way as an aggregate. The only difference is that the
state value is an aggregate state value instead of a global variable.
There might be some extra palloc overhead, but nothing worse.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas F. O'Connell 2005-07-14 16:59:39 Re: ERROR: could not open relation
Previous Message Thomas F. O'Connell 2005-07-14 16:42:40 Re: ERROR: could not open relation