Automatic analyze on select into

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Automatic analyze on select into
Date: 2003-04-26 15:15:37
Message-ID: 20030426101537.F66185@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

BEGIN;
SET LOCAL enable_seqscan = off;
SELECT id, team_id, sum(work_units) AS work_units
INTO TEMP email_contrib_summary
FROM email_contrib
WHERE project_id = :ProjectID
GROUP by id, team_id
;
COMMIT;

inserts 29000 rows...

UPDATE email_contrib_summary
SET id = sp.retire_to
FROM stats_participant sp
WHERE sp.id = email_contrib_summary.id
AND sp.retire_to >= 0
AND (sp.retire_date >= (SELECT ps.last_date FROM project_statsrun ps WHERE ps.project_id = :ProjectID)
OR sp.retire_date IS NULL)
;
Nested Loop (cost=0.00..5475.20 rows=982 width=54) (actual time=25.54..2173363.11 rows=29181 loops=1)
InitPlan
-> Seq Scan on project_statsrun ps (cost=0.00..1.06 rows=1 width=4) (actual time=0.06..0.07 rows=1 loops=1)
Filter: (project_id = 8)
-> Seq Scan on email_contrib_summary (cost=0.00..20.00 rows=1000 width=46) (actual time=25.11..1263.26 rows=29753 loops=1)
-> Index Scan using stats_participant__participantretire_id on stats_participant sp (cost=0.00..5.44 rows=1 width=8) (actual time=2.16..72.93 rows=1 loops=29753)
Index Cond: ((sp.retire_to >= 0) AND (sp.id = "outer".id))
Filter: ((retire_date >= $0) OR (retire_date IS NULL))
Total runtime: 2174315.61 msec

GAH! 45 minutes to update 29k rows! BUT, if I do

Hash Join (cost=41104.03..42410.14 rows=29166 width=38) (actual time=8391.81..10925.07 rows=29181 loops=1)
Hash Cond: ("outer".id = "inner".id)
InitPlan
-> Seq Scan on project_statsrun ps (cost=0.00..1.06 rows=1 width=4) (actual time=0.05..0.06 rows=1 loops=1)
Filter: (project_id = 8)
-> Seq Scan on email_contrib_summary (cost=0.00..496.01 rows=29701 width=30) (actual time=0.20..387.95 rows=29753 loops=1)
-> Hash (cost=13939.69..13939.69 rows=394217 width=8) (actual time=8390.72..8390.72 rows=0 loops=1)
-> Seq Scan on stats_participant sp (cost=0.00..13939.69 rows=394217 width=8) (actual time=0.22..5325.38 rows=389115 loops=1)
Filter: ((retire_to >= 0) AND ((retire_date >= $0) OR (retire_date IS NULL)))
Total runtime: 11584.09 msec

Ahhh... soothing relief...

So, question is, would it make sense to automatically do an analyze
after/during a SELECT INTO? Would it be very expensive to analyze the
data as it's being inserted? I think it's pretty well understood that
you want to vacuum/vacuum analyze the entire database regularly, but
that obviously wouldn't help temporary tables... maybe it makes the most
sense to automatically analyze temporary tables only. For that matter,
since temp tables only have one operation performed on them at a time,
maybe it makes sense to keep stats for them up-to-date as part of every
operation?
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-04-26 16:18:54 Re: Automatic analyze on select into
Previous Message scott.marlowe 2003-04-25 22:40:54 Re: Indexes with different datatypes:Correction