Re: Performance woes relating to DISTINCT (I think)

From: boinger <boinger(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance woes relating to DISTINCT (I think)
Date: 2005-09-27 14:07:41
Message-ID: 9e6d8b53050927070755640862@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 9/27/05, Dawid Kuroczko <qnex42(at)gmail(dot)com> wrote:
> > QUERY PLAN
> > -> GroupAggregate (cost=0.00..85168.65 rows=11
> width=22)
> > (actual time=3149.916..45578.292 rows=515 loops=1)
>
> Hmm, planner expected 11 rows, got 515
>
>
> > (cost=0.00..85167.23 rows=107 width=22) (actual
> > time=3144.908..45366.147 rows=29893 loops=1)
>
>
> planner expected 107 rows, got 29893...
> I guess the problem here is that planner has wrong idea how your
> data looks. Try doing two things:
>
> VACUUM ANALYZE;
> (of tables in question or whole database)
>
> If that doesn't help, do increase the statistics target. By default
> PostgreSQL
> keeps 10 samples, but you might want to increase it to 50 or even 100.
> And then rerun VACUUM ANALYZE.
>
> If it doesn't help -- please repost the new query plan once again.

I actually kind of inadvertently "fixed" it.

I threw my hands up and thought to myself "FINE! If it's going to take
that long, at least it can do all the joins and whatnot instead of
having to loop back and do separate queries"

So, I piled in everything I needed it to do, and now it's inexplicably
(to me) fast (!?).

I'm still running a full VACUUM ANALYZE on your recommendation...maybe
shave a few more ms off.

Here's what I have, now (pre-vacuum):

SQL:
SELECT
tasks_applied.modcode AS modcode,
tasks_applied.seid AS seid,
tasks_applied.yearcode AS yearcode,
vin_years.year AS year,
COUNT(DISTINCT(tid)) AS task_count
FROM
"SS_valid_modelyears",
tasks_applied,
vin_years
WHERE
cid=0
AND tasks_applied.seid='500001'

AND "SS_valid_modelyears".modcode=tasks_applied.modcode

AND "SS_valid_modelyears".year=vin_years.year
AND tasks_applied.yearcode=vin_years.yearcode

AND "SS_valid_modelyears".valid=1
GROUP BY
tasks_applied.seid,
vin_years.year,
tasks_applied.modcode,
"SS_valid_modelyears".shortname,
tasks_applied.yearcode
ORDER BY
tasks_applied.seid ASC,
vin_years.year ASC

QUERY PLAN:
GroupAggregate (cost=201.39..201.42 rows=1 width=69) (actual
time=80.383..80.386 rows=1 loops=1)

-> Sort (cost=201.39..201.40 rows=1 width=69) (actual
time=79.737..79.898 rows=59 loops=1)

Sort Key: tasks_applied.seid, vin_years."year",
tasks_applied.modcode, "SS_valid_modelyears".shortname,
tasks_applied.yearcode

-> Nested Loop (cost=1.38..201.38 rows=1 width=69) (actual
time=72.599..78.765 rows=59 loops=1)

-> Hash Join (cost=1.38..165.15 rows=6 width=61)
(actual time=0.530..18.881 rows=1188 loops=1)

Hash Cond: ("outer"."year" = "inner"."year")

-> Seq Scan on "SS_valid_modelyears"
(cost=0.00..163.54 rows=36 width=56) (actual time=0.183..9.202
rows=1188 loops=1)

Filter: ("valid" = 1)

-> Hash (cost=1.30..1.30 rows=30 width=9)
(actual time=0.230..0.230 rows=0 loops=1)

-> Seq Scan on vin_years (cost=0.00..1.30
rows=30 width=9) (actual time=0.019..0.116 rows=30 loops=1)

-> Index Scan using strafe_group on tasks_applied
(cost=0.00..6.02 rows=1 width=22) (actual time=0.042..0.043 rows=0
loops=1188)

Index Cond: ((("outer".modcode)::text =
(tasks_applied.modcode)::text) AND (tasks_applied.yearcode =
"outer".yearcode) AND (tasks_applied.seid = 500001))

Filter: (cid = 0)

Total runtime: 80.764 ms

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Poul Møller Hansen 2005-09-27 14:16:14 Re: Slow query using LIMIT [SOLVED]
Previous Message Tom Lane 2005-09-27 14:04:48 Re: Mysterious query plan