Help with performance on current status column

From: Chris Kratz <chris(dot)kratz(at)vistashare(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Help with performance on current status column
Date: 2005-09-13 21:08:26
Message-ID: 200509131708.26638.chris.kratz@vistashare.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello All,

We are struggling with a specific query that is killing us. When doing
explain analyze on the entire query, we *seem* to be getting killed by the
estimated number of rows on a case statement calculation.

I've included a snippet from the explain analyze of the much larger query. The
line in question, (cost=0.00..106.52 rows=1 width=16) (actual
time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows
instead of 1 which when coupled with a later join causes the statement to run
over 3 minutes.[1]

It seems that it thinks that the scan on role_id is going to return 1 row, but
in reality returns 4725 rows. The case statement causing the problem uses
todays date to see if a particular row is still active. Here is a test case
showing how far off the estimate is from the reality. [2]

I'm not too surprised to see that the estimate is off because it is
calculated, but does anyone know either how to make the estimate more
accurate so it picks a better plan, or is there a better way to do a "status"
function based off of the current date so that it is more efficient? I've
played with statistics on this table (racheting them up to 1000) with no
change in the plan.

Any thoughts?

-Chris

[1] explain analyze snippet from larger query
-> Nested Loop (cost=0.00..955.70 rows=1 width=204) (actual
time=3096.689..202704.649 rows=17 loops=1)
Join Filter: ("inner".nameid = "outer".name_id)
-> Nested Loop (cost=0.00..112.25 rows=1 width=33) (actual
time=0.271..90.760 rows=4725 loops=1)
-> Index Scan using role_definition_description_idx on
role_definition rdf (cost=0.00..5.72 rows=1 width=21) (actual
time=0.215..0.218 rows=1 loops=1)
Index Cond: (description = 'Participant'::text)
Filter: (program_id = 120)
-> Index Scan using roles_role_id_idx on roles rol
(cost=0.00..106.52 rows=1 width=16) (actual time=0.048..67.728 rows=4725
loops=1)
Index Cond: (rol.role_id = "outer".role_id)
Filter: (CASE WHEN (role_id IS NULL) THEN NULL::text WHEN
(("begin" IS NOT NULL) AND ("end" IS NOT NULL)) THEN CASE WHEN
((('now'::text)::date >= "begin") AND (('now'::text)::date <= "end")) THEN
'Active'::text ELSE 'Inactive'::text END WHEN ("begin" IS NOT NULL) THEN CASE
WHEN (('now'::text)::date >= "begin") THEN 'Active'::text ELSE
'Inactive'::text END WHEN ("end" IS NOT NULL) THEN CASE WHEN
(('now'::text)::date <= "end") THEN 'Active'::text ELSE 'Inactive'::text END
ELSE 'Active'::text END = 'Active'::text)
-> Nested Loop Left Join (cost=0.00..842.19 rows=97 width=175) (actual
time=6.820..42.863 rows=21 loops=4725)
-> Index Scan using namemaster_programid_idx on namemaster dem
(cost=0.00..470.12 rows=97 width=164) (actual time=6.811..42.654 rows=21
loops=4725)
Index Cond: (programid = 120)
Filter: ((name_float_lfm ~~* '%clark%'::text) OR
(metaphone(name_float_lfm, 4) = 'KLRK'::text) OR (soundex(name_float_lfm) =
'C462'::text))
-> Index Scan using validanswerid_pk on validanswer ina
(cost=0.00..3.82 rows=1 width=19) (actual time=0.003..0.004 rows=1
loops=99225)
Index Cond: (ina.validanswerid = "outer".inactive)

---------------------
[2] A much simpler statement triggers the incorrect row counts here.

explain analyze
select *
from roles rol
where

CASE
WHEN rol.role_id IS NULL
THEN NULL
WHEN rol."begin" IS NOT NULL and rol."end" IS NOT NULL
THEN
CASE WHEN TIMESTAMP 'now'>=rol."begin" and TIMESTAMP
'now'<=rol."end"
THEN 'Active'
ELSE 'Inactive' END
WHEN rol."begin" IS NOT NULL
THEN
CASE WHEN TIMESTAMP 'now'>=rol."begin"
THEN 'Active'
ELSE 'Inactive' END
WHEN rol."end" IS NOT NULL
THEN
CASE WHEN TIMESTAMP 'now'<=rol."end"
THEN 'Active'
ELSE 'Inactive' END
ELSE 'Active'
END = 'Active'

Seq Scan on roles rol (cost=0.00..2368.54 rows=413 width=20) (actual
time=0.046..562.635 rows=79424 loops=1)
Filter: (CASE WHEN (role_id IS NULL) THEN NULL::text WHEN (("begin" IS NOT
NULL) AND ("end" IS NOT NULL)) THEN CASE WHEN (('2005-09-13
16:43:18.721214'::timestamp without time zone >= "begin") AND ('2005-09-13
16:43:18.721214'::timestamp without time zone <= "end")) THEN 'Active'::text
ELSE 'Inactive'::text END WHEN ("begin" IS NOT NULL) THEN CASE WHEN
('2005-09-13 16:43:18.721214'::timestamp without time zone >= "begin") THEN
'Active'::text ELSE 'Inactive'::text END WHEN ("end" IS NOT NULL) THEN CASE
WHEN ('2005-09-13 16:43:18.721214'::timestamp without time zone <= "end")
THEN 'Active'::text ELSE 'Inactive'::text END ELSE 'Active'::text END =
'Active'::text)
Total runtime: 884.456 ms
(3 rows)
--
Chris Kratz

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message li li 2005-09-13 22:28:19 unsubscribe
Previous Message cydatamatt@gmail.com 2005-09-13 20:23:52 How many tables is too many tables?