Re: Update with Subquery Performance

From: "Linux Guru" <linux(dot)binary(at)gmail(dot)com>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Update with Subquery Performance
Date: 2008-02-12 08:46:34
Message-ID: 3caa866c0802120046w60f016efx38d3eec2bdd2313e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

See, its calculating sum by grouping the product field. Here is an example

Product GP
--------- -------
A 30
B 40
A 30
C 50
C 50

Now the query calculates aggregated sum and divide by grouping product so
all A's must have same the result, so with B's and C's.

> Is this supposed to be updating every single row with one value?
> Cause I'm guessing it's running that sub select over and over instead
> of one time.
>
yes you are right that its calculating every time for all elements in each
group i.e. GP(A) is calculated twice for A, where it should only calculated
once for each group. Is there any way to achieve this?

analyze;
> set work_mem = 128000;
> between the alter and update and see if that helps.

that did not help

> Also, as Tom said, post explain analyze output of the statement.

"Seq Scan on dummy (cost=0.00..56739774.24 rows=23441 width=275) (actual
time=18.927..577929.014 rows=22712 loops=1)"
" SubPlan"
" -> Aggregate (cost=2420.41..2420.43 rows=1 width=19) (actual time=
25.423..25.425 rows=1 loops=22712)"
" -> Seq Scan on dummy "temp" (cost=0.00..2416.01 rows=586
width=19) (actual time=0.049..17.834 rows=2414 loops=22712)"
" Filter: ((product)::text = ($0)::text)"
"Total runtime: 578968.885 ms"

Thanks

On Feb 12, 2008 2:29 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:

> On Feb 11, 2008 5:06 AM, Linux Guru <linux(dot)binary(at)gmail(dot)com> wrote:
> > We have a large datawarehouse stored in postgres and temp tables are
> created
> > based on user query. The process of temp table creation involves
> selecting
> > data from main fact table, this includes several select and update
> > statements and one of the following update statement is having
> performance
> > issues.
> >
> > The newly temp table created for this scenario contains 22712 rows. Here
> is
> > the query
> >
> > alter table dummy add column gp numeric(40,15);
> > update dummy set gp=(select (
> > case when sum(temp.pd) <> 0 then sum(temp.gd)/sum(temp.pd)*100 else 0
> end )
> > from dummy as temp
> > where temp.product=dummy.product)
>
> Is this supposed to be updating every single row with one value?
> Cause I'm guessing it's running that sub select over and over instead
> of one time. I'm guessing that with more work_mem the planner might
> use a more efficient plan. Try adding
>
> analyze;
> set work_mem = 128000;
> between the alter and update and see if that helps.
>
> Also, as Tom said, post explain analyze output of the statement.
>
>
> >
> > Now this query basically updates a table using values within itself in
> the
> > subquery but it takes tooooo much time i.e. approx 5 mins. The whole
> temp
> > table creation process is stucked in this query (there are 4 additional
> such
> > updates with same problem). Index creation is useless here since its
> only a
> > one time process.
> >
> > Here is the strip down version (the part making performance issue) of
> above
> > query i.e. only select statement
> > -------------------------------
> > select (case when sum(temp.pd) <> 0 then sum(temp.gd)/sum(temp.pd)*100
> else
> > 0 end ) from dummy as temp, dummy as temp2
> > where temp.product=temp2.product group by temp.product
> >
> > "HashAggregate (cost=1652480.98..1652481.96 rows=39 width=39)"
> > " -> Hash Join (cost=1636.07..939023.13 rows=71345785 width=39)"
> > " Hash Cond: (("temp".product)::text = (temp2.product)::text)"
> > " -> Seq Scan on dummy "temp" (cost=0.00..1311.03 rows=26003
> > width=39)"
> > " -> Hash (cost=1311.03..1311.03 rows=26003 width=21)"
> > " -> Seq Scan on dummy temp2 (cost=0.00..1311.03rows=26003
> > width=21)"
> > -------------------------------
> >
> >
> > Whats the solution of this problem, or any alternate way to write this
> > query?
> >
> >
> >
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Zaksek 2008-02-12 10:11:11 Re: Join Query Perfomance Issue
Previous Message Linux Guru 2008-02-12 08:32:29 Re: Update with Subquery Performance