Re: Calculating statistic via function rather than with query is slowing my query

From: Anish Kejariwal <anishkej(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Calculating statistic via function rather than with query is slowing my query
Date: 2011-08-17 19:00:54
Message-ID: CAOpcnr-1=YGsp=OAV58cdEfNw4EEuuzEqo53+9C7kURZLmX4Ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Pavel! that definitely solved it.

Unfortunately, the function I gave you was a simple/short version of what
the actual function is going to be. The actual function is going to get
parameters passed to it, and based on the parameters will go through some
if...else conditions, and maybe even call another function. Based on that,
I was definitely hoping to use plpgsql, and the overhead is unfortunate.

Is there any way to get around this overhead? Will I still have the same
overhead if I use plperl, plpython, pljava, or write the function in C?

Anish

On Wed, Aug 17, 2011 at 11:27 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:

> Hello
>
> 2011/8/17 Anish Kejariwal <anishkej(at)gmail(dot)com>:
> > Hi everyone,
> > I'm using postgres 9.0.3, and here's the OS I'm running this on:
> > Linux 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 14:02:29 EDT 2011 x86_64
> > x86_64 x86_64 GNU/Linux
> > I have a fairly straight forward query. I'm doing a group by on an ID,
> and
> > then calculating some a statistic on the resulting data. The problem I'm
> > running into is that when I'm calculating the statistics via a function,
> > it's twice as slow as when I'm calculating the statistics directly in my
> > query. I want to be able to use a function, since I'll be using this
> > particular calculation in many places.
> > Any idea of what's going on? Below, I've included my function, and both
> > queries (I removed the type_ids, and just wrote …ids…
> > Here's my function (I also tried stable):
> > CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
> > integer)
> > RETURNS double precision AS $$
> > BEGIN
> > return a/b/c* 1000000000::double precision;
> > END;
> > $$ LANGUAGE plpgsql immutable;
> >
>
> this is overhead of plpgsql call. For this simple functions use a SQL
> functions instead
>
> CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
> integer)
> RETURNS double precision AS $$
> > SELECT $1/$2/$3* 1000000000::double precision;
> > $$ LANGUAGE sql;
>
> Regards
>
> Pavel Stehule
>
> > The query that takes 7.6 seconds, when I calculate the statistic from
> within
> > the query:
> > explain analyze
> > select
> > agg.primary_id,
> > avg(agg.a / agg.b / agg.c * 1000000000::double precision) foo,
> > stddev(agg.a / agg.b / agg.c * 1000000000::double precision) bar
> > from mytable agg
> > where agg.type_id in (....ids....)
> > group by agg.primary_id;
> > The execution plan:
> > HashAggregate (cost=350380.58..350776.10 rows=9888 width=20) (actual
> > time=7300.414..7331.659 rows=20993 loops=1)
> > -> Bitmap Heap Scan on mytable agg (cost=28667.90..337509.63
> > rows=1716127 width=20) (actual time=200.064..2861.600 rows=2309230
> loops=1)
> > Recheck Cond: (type_id = ANY ('{....ids....}'::integer[]))
> > -> Bitmap Index Scan on mytable_type_id_idx
> (cost=0.00..28238.87
> > rows=1716127 width=0) (actual time=192.725..192.725 rows=2309230 loops=1)
> > Index Cond: (type_id = ANY ('{....ids....}'::integer[]))
> > Total runtime: 7358.337 ms
> > (6 rows)
> >
> >
> >
> > The same query, but now I'm calling the function. When I call the
> function
> > it's taking 15.5 seconds.
> > explain analyze select
> > agg.primary_id,
> > avg(calc_test(agg.a,agg.b,agg.c)) foo,
> > stddev(calc_test(agg.a,agg.b,agg.c)) bar
> > from mytable agg
> > where agg.type_id in (....ids....)
> > group by agg.primary_id;
> > and, here's the execution plan:
> > HashAggregate (cost=350380.58..355472.90 rows=9888 width=20) (actual
> > time=13660.838..13686.618 rows=20993 loops=1)
> > -> Bitmap Heap Scan on mytable agg (cost=28667.90..337509.63
> > rows=1716127 width=20) (actual time=170.385..2881.122 rows=2309230
> loops=1)
> > Recheck Cond: (type_id = ANY ('{....ids....}'::integer[]))
> > -> Bitmap Index Scan on mytable_type_id_idx
> (cost=0.00..28238.87
> > rows=1716127 width=0) (actual time=162.834..162.834 rows=2309230 loops=1)
> > Index Cond: (type_id = ANY ('{....ids....}'::integer[]))
> > Total runtime: 13707.560 ms
> >
> > Thanks!
> > Anish
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gary Doades 2011-08-17 19:04:41 Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++
Previous Message Tomas Vondra 2011-08-17 18:56:49 Re: Tuning Tips for a new Server