Re: [HACKERS] Big IN() clauses etc : feature proposal

From: PFC <lists(at)peufeu(dot)com>
To: "Markus Schaber" <schabi(at)logix-tt(dot)com>, pgsql-performance(at)postgresql(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Big IN() clauses etc : feature proposal
Date: 2006-05-10 14:38:31
Message-ID: op.s9cayhotcigqcu@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


>> The problem is that you need a set-returning function to retrieve
>> the values. SRFs don't have rowcount estimates, so the plans suck.
>
> What about adding some way of rowcount estimation to SRFs, in the way of:
>
> CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS
> $$ ... function code ... $$ LANGUAGE plpgsql
> ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ;
>
> Internally, this could create two functions, foo (para, meters) and
> estimate_foo(para, meters) that are the same language and coupled
> together (just like a SERIAL column and its sequence). The estimator
> functions have an implicit return parameter of int8. Parameters may be
> NULL when they are not known at query planning time.
>
> What do you think about this idea?

It would be very useful.
A few thoughts...

You need to do some processing to know how many rows the function would
return.
Often, this processing will be repeated in the function itself.
Sometimes it's very simple (ie. the function will RETURN NEXT each
element in an array, you know the array length...)
Sometimes, for functions returning few rows, it might be faster to
compute the entire result set in the cost estimator.

So, it might be a bit hairy to find a good compromise.

Ideas on how to do this (clueless hand-waving mode) :

1- Add new attributes to set-returning functions ; basically a list of
functions, each returning an estimation parameter (rowcount, cpu tuple
cost, etc).
This is just like you said.

2- Add an "estimator", to a function, which would just be another
function, returning one row, a record, containing the estimations in
several columns (rowcount, cpu tuple cost, etc).
Pros : only one function call to estimate, easier and faster, the
estimator just leaves the unknown columns to NULL.
The estimator needs not be in the same language as the function itself.
It's just another function.

3- The estimator could be a set-returning function itself which would
return rows mimicking pg_statistics
Pros : planner-friendly, the planner would SELECT from the SRF instead of
looking in pg_statistics, and the estimator could tell the planner that,
for instance, the function will return unique values.
Cons : complex, maybe slow

4- Add simple flags to a function, like :
- returns unique values
- returns sorted values (no need to sort my results)
- please execute me and store my results in a temporary storage, count
the rows returned, and plan the outer query accordingly
- etc.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-05-10 14:55:51 Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message Cristiano Duarte 2006-05-10 14:26:30 Re: BUG #2429: Explain does not report object's schema

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2006-05-10 14:42:59 Re: [PERFORM] Arguments Pro/Contra Software Raid
Previous Message Markus Schaber 2006-05-10 14:38:17 Re: [PERFORM] Arguments Pro/Contra Software Raid