JOIN to a VIEW makes a real slow query

From: "Chuck D(dot)" <pgsql-performance(at)nullmx(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: JOIN to a VIEW makes a real slow query
Date: 2007-02-13 18:53:55
Message-ID: 200702131253.55486.pgsql-performance@nullmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi folks,

I don't know if this is an SQL or PERFORMANCE list problem but I wanted to
check here first. I've seen this discussed on the list before but I'm still
not sure of the solution. Maybe my query is just structured wrong.

I recently visited an old project of mine that has a 'city', 'state,'
and 'country' tables. The city data comes from multiple sources and totals
about 3 million rows. I decided to split the city table up based on the
source (world_city, us_city). This makes easier updating because the
assigned feature id's from the two sources overlap in some cases making it
impossible to update as a single merged table.

However, I decided to create a view to behave like the old 'city' table. The
view is just a simple:

SELECT [columns]
FROM world_city
UNION
SELECT [columns]
FROM us_city
;

Selecting from the view is very quick, but JOINing to the view is slow. About
65 seconds to select a city. It doesn't matter wether it is joined to one
table or 6 like it is in my user_detail query - it is still slow. It has
indexes on the city_id, state_id, country_id of each table in the view too.
Everything has been 'VACUUM ANALYZE' ed.

When using explain analyze from the view I get this:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM m_user AS mu
cmi-# left JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=650146.58..751018.45 rows=10618 width=55)
(actual time=53078.261..61269.190 rows=1 loops=1)
Join Filter: ("outer".city_id = "inner"."?column1?")
-> Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=27) (actual
time=0.010..0.022 rows=1 loops=1)
-> Unique (cost=650146.58..703236.51 rows=2123597 width=62) (actual
time=49458.007..59635.140 rows=2122712 loops=1)
-> Sort (cost=650146.58..655455.58 rows=2123597 width=62) (actual
time=49458.003..55405.965 rows=2122712 loops=1)
Sort Key: city_id, state_id, country_id, cc1, rc, adm1, lat,
lon, city_name
-> Append (cost=0.00..73741.94 rows=2123597 width=62) (actual
time=18.835..13706.395 rows=2122712 loops=1)
-> Seq Scan on us_city (cost=0.00..4873.09 rows=169409
width=62) (actual time=18.832..620.553 rows=169398 loops=1)
-> Seq Scan on world_city (cost=0.00..47632.88
rows=1954188 width=61) (actual time=23.513..11193.341 rows=1953314 loops=1)
Total runtime: 61455.471 ms
(10 rows)

Time: 61512.377 ms

So, a sequence scan on the tables in the view, won't use the index.

Then do the same query by replacing the view with the real table:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM m_user AS mu
cmi-# left JOIN geo.world_city AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..4.04 rows=1 width=36) (actual
time=53.854..53.871 rows=1 loops=1)
-> Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=27) (actual
time=0.010..0.016 rows=1 loops=1)
-> Index Scan using world_city_pk on world_city ci (cost=0.00..3.01
rows=1 width=17) (actual time=53.825..53.833 rows=1 loops=1)
Index Cond: ("outer".city_id = ci.city_id)
Total runtime: 53.989 ms
(5 rows)

Time: 56.234 ms

I'm not sure that a view on a UNION is the best idea but I don't know how to
go about keeping the tables from the data sources with the view (other than
modifying them with a source_id column). Any ideas on what is causing the
performance lag?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2007-02-13 19:16:54 Re: JOIN to a VIEW makes a real slow query
Previous Message Alan Hodgson 2007-02-13 18:48:25 Re: CPU Usage