Poor query performance on one of two "like" databases in production.

From: <Keaton_Adams(at)McAfee(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Poor query performance on one of two "like" databases in production.
Date: 2010-05-14 17:28:56
Message-ID: C812E478.166AF%keaton_adams@mcafee.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I hope you can provide some answers to a strange problem. This is in production and is a Severity #1 issue we are having, so any help you can provide would be appreciated.

PG: PostgreSQL 8.3.7
OS: RHEL 5 64 bit

We have two databases with the same DB schema managing different sets of users. A query that was running fine for some time has started to run very poorly (basically never finishes) in one of the two databases. The EXPLAIN output from both DBs is shown below. The Merge Join approach will run in 24 seconds. The Nested Loop / Hash Join is taking a totally different approach and is never finishing.

I did run an ANALYZE, then vacuum full, then another ANALYZE on all tables involved in the query. That didn't change things. I compared QUERY TUNING settings in both postgresql.conf files and they are identical. There is a difference in row counts in each table between the databases, but not by much.

Bad plan:

postgres(at)p01c06d130> psql -Upostgres -dmxl -fbadquery.sql
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=14310.81..2306550343.42 rows=244089 width=1564)
-> Hash Join (cost=14310.81..2305727555.65 rows=1988818 width=1564)
Hash Cond: ((u.customer_id = d.customer_id) AND (u.domain_id = d.domain_id))
-> Append (cost=2937.06..2285528361.02 rows=1026394 width=1044)
-> Index Scan using mxl_user_domain_id_idx on mxl_user u (cost=2937.06..2013937900.16 rows=900481 width=52)
Filter: ((active <> 0) AND (NOT (subplan)))
SubPlan
-> Materialize (cost=2937.06..4764.75 rows=131369 width=4)
-> Seq Scan on mxl_user_group (cost=0.00..2291.69 rows=131369 width=4)
-> Subquery Scan "*SELECT* 2" (cost=2945.94..271590460.85 rows=125913 width=1044)
Filter: (NOT (subplan))
-> Merge Join (cost=8.88..101782.76 rows=251826 width=55)
Merge Cond: (u.user_id = ua.user_id)
-> Index Scan using mxl_user_pkey on mxl_user u (cost=0.00..84078.47 rows=1868036 width=28)
-> Index Scan using mxl_user_alias_uid_idx on mxl_user_alias ua (cost=0.00..10224.50 rows=251826 width=31)
Filter: (ua.active <> 0)
SubPlan
-> Materialize (cost=2937.06..4764.75 rows=131369 width=4)
-> Seq Scan on mxl_user_group (cost=0.00..2291.69 rows=131369 width=4)
-> Hash (cost=4988.15..4988.15 rows=77507 width=528)
-> Append (cost=0.00..4988.15 rows=77507 width=528)
-> Seq Scan on mxl_domain d (cost=0.00..1810.59 rows=64367 width=28)
Filter: (active = 1)
-> Subquery Scan "*SELECT* 2" (cost=2454.26..3177.56 rows=13140 width=528)
-> Hash Join (cost=2454.26..3046.16 rows=13140 width=33)
Hash Cond: (da.domain_id = d.domain_id)
-> Seq Scan on mxl_domain_alias da (cost=0.00..296.25 rows=13140 width=25)
Filter: (active = 1)
-> Hash (cost=1649.67..1649.67 rows=64367 width=12)
-> Seq Scan on mxl_domain d (cost=0.00..1649.67 rows=64367 width=12)
-> Index Scan using wds_policy_set_id_idx on wds_policy_set p (cost=0.00..0.39 rows=2 width=8)
Index Cond: (p.id = u.customer_id)
Filter: ((p.default_flag = 1) AND (p.web_access_flag = 1) AND (p.active = 1) AND (p.scope = 3))
(33 rows)

Good plan:

kadams(at)p02c06d130> psql -Upostgres -dmxl -fbadquery.sql
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=741554.63..751762.07 rows=202281 width=1564)
Merge Cond: ((p.id = u.customer_id) AND (d.domain_id = u.domain_id))
-> Sort (cost=22143.37..22332.37 rows=75599 width=536)
Sort Key: p.id, d.domain_id
-> Hash Join (cost=362.69..5423.15 rows=75599 width=536)
Hash Cond: (d.customer_id = p.id)
-> Append (cost=0.00..3008.78 rows=47116 width=528)
-> Seq Scan on mxl_domain d (cost=0.00..1003.92 rows=35674 width=28)
Filter: (active = 1)
-> Subquery Scan "*SELECT* 2" (cost=1360.66..2004.86 rows=11442 width=528)
-> Hash Join (cost=1360.66..1890.44 rows=11442 width=33)
Hash Cond: (da.domain_id = d.domain_id)
-> Seq Scan on mxl_domain_alias da (cost=0.00..258.02 rows=11442 width=25)
Filter: (active = 1)
-> Hash (cost=914.74..914.74 rows=35674 width=12)
-> Seq Scan on mxl_domain d (cost=0.00..914.74 rows=35674 width=12)
-> Hash (cost=329.68..329.68 rows=2641 width=8)
-> Seq Scan on wds_policy_set p (cost=0.00..329.68 rows=2641 width=8)
Filter: ((default_flag = 1) AND (web_access_flag = 1) AND (active = 1) AND (scope = 3))
-> Materialize (cost=719411.26..730421.65 rows=880831 width=1044)
-> Sort (cost=719411.26..721613.34 rows=880831 width=1044)
Sort Key: u.customer_id, u.domain_id
-> Append (cost=2068.96..159849.28 rows=880831 width=1044)
-> Seq Scan on mxl_user u (cost=2068.96..66848.58 rows=806426 width=50)
Filter: ((active <> 0) AND (NOT (hashed subplan)))
SubPlan
-> Seq Scan on mxl_user_group (cost=0.00..1813.57 rows=102157 width=4)
-> Subquery Scan "*SELECT* 2" (cost=2077.18..93000.70 rows=74405 width=1044)
Filter: (NOT (hashed subplan))
-> Merge Join (cost=8.22..89071.62 rows=148810 width=53)
Merge Cond: (u.user_id = ua.user_id)
-> Index Scan using mxl_user_pkey on mxl_user u (cost=0.00..77888.20 rows=1656041 width=26)
-> Index Scan using mxl_user_alias_uid_idx on mxl_user_alias ua (cost=0.00..5227.92 rows=148810 width=31)
Filter: (ua.active <> 0)
SubPlan
-> Seq Scan on mxl_user_group (cost=0.00..1813.57 rows=102157 width=4)
(36 rows)

Query that runs on both DBs:

EXPLAIN SELECT substring(users.email from '^.*\@') || domains.domain as email,
users.customer_id,
users.password,
p.policy_set_id,
users.user_id,
domains.auth_type
FROM
(
SELECT d.customer_id, d.domain, d.domain_id, d.auth_type, d.active from mxl_domain d
UNION ALL
SELECT d.customer_id, da.domain, da.domain_id, d.auth_type, da.active from mxl_domain d, mxl_domain_alias da WHERE da.domain_id = d.domain_id
) as domains,
(
SELECT u.email, u.customer_id, u.user_id, u.domain_id, u.password, u.active from mxl_user u
UNION ALL
SELECT ua.email, u.customer_id, u.user_id, u.domain_id, u.password, ua.active from mxl_user u, mxl_user_alias ua WHERE ua.user_id = u.user_id
) as users,
wds_policy_set p
WHERE users.customer_id = p.id
AND users.customer_id = domains.customer_id
AND users.domain_id = domains.domain_id
AND p.default_flag = 1
AND p.web_access_flag = 1
AND p.scope = 3
AND domains.active = 1
AND users.active != 0
AND p.active = 1
AND users.user_id NOT IN (SELECT user_id FROM mxl_user_group);

#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
enable_hashagg = off
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 1.1 # same scale as above
#cpu_tuple_cost = 0.01 # same scale as above
#cpu_index_tuple_cost = 0.005 # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
effective_cache_size = 2500MB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5 # range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # selects default based on effort
#geqo_selection_bias = 2.0 # range 1.5-2.0

# - Other Planner Options -

default_statistics_target = 250 # range 1-1000
constraint_exclusion = on
#from_collapse_limit = 8
#join_collapse_limit = 8 # 1 disables collapsing of explicit
# JOIN clauses

Row counts from DB where a Merge Join is selected:

mxl=# select count(*) from mxl_user;
count
---------
1655106
(1 row)

mxl=# select count(*) from mxl_user_alias;
count
--------
148864
(1 row)

mxl=# select count(*) from mxl_domain;;
count
-------
35487
(1 row)

mxl=# select count(*) from mxl_domain_alias;
count
-------
11445
(1 row)

mxl=# select count(*) from mxl_user_group;
count
--------
102277
(1 row)

mxl=# select count(*) from wds_policy_set;
count
-------
8434
(1 row)

Row counts from DB where Hash Join is selected:

mxl=# select count(*) from mxl_user;
count
---------
1869012
(1 row)

mxl=# select count(*) from mxl_user_alias;
count
--------
251827
(1 row)

mxl=# select count(*) from mxl_domain;
count
-------
64377
(1 row)

mxl=# select count(*) from mxl_domain_alias;
count
-------
13149
(1 row)

mxl=# select count(*) from mxl_user_group;
count
--------
131305
(1 row)

mxl=# select count(*) from wds_policy_set;
count
-------
13387
(1 row)

Thanks,

-K

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc G. Fournier 2010-05-14 17:42:30 Re: List traffic
Previous Message Yeb Havinga 2010-05-14 17:09:17 Re: List traffic