Query on view radically slower than query on underlying table

From: Craig James <craig_james(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Query on view radically slower than query on underlying table
Date: 2011-02-28 18:28:44
Message-ID: 4D6BE95C.3090102@emolecules.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We have a medium-sized catalog (about 5 million rows), but some of our customers only want to see portions of it. I've been experimenting with a customer-specific schema that contains nothing but a "join table" -- just the primary keys of that portion of the data that each customer wants to see, which is used to create a view that looks like the original table. But the most important query, the one that customers use to scan page-by-page through search results, turns out to be far too slow (65 seconds versus 55 milliseconds).

Below are the results of two explain/analyze statements. The first one uses the view, the second one goes directly to the original tables. I thought this would be a slam-dunk, that it would return results in a flash because the view is created from two tables with the same primary keys.

My guess (and it's just a wild guess) is that the "left join" is forcing a sequence scan or something. But we need the left join, because it's on a "hitlist" that recorded all the matches to a customer's earlier query, and if rows have been removed from the tables, the customer needs to see a blank row.

Here is the "bad" query, which is run on the view:

em=> explain analyze
select version.version_id, version.isosmiles
from hitlist_rows_reset_140
left join version on (hitlist_rows_reset_140.objectid = version.version_id)
where hitlist_rows_reset_140.sortorder >= 1
and hitlist_rows_reset_140.sortorder <= 10
order by hitlist_rows_reset_140.sortorder;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
------------------------------
Nested Loop Left Join (cost=23687.51..215315.74 rows=1 width=54) (actual time=2682.662..63680.076 rows=10 loops=1)
Join Filter: (hitlist_rows_reset_140.objectid = v.version_id)
-> Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140 (cost=0.00..8.36 rows=1 width=8) (actual time=
0.015..0.049 rows=10 loops=1)
Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
-> Hash Join (cost=23687.51..204666.54 rows=851267 width=50) (actual time=31.829..6263.403 rows=851267 loops=10)
Hash Cond: (v.version_id = mv.version_id)
-> Seq Scan on version v (cost=0.00..116146.68 rows=5631968 width=50) (actual time=0.006..859.758 rows=5632191 loo
ps=10)
-> Hash (cost=13046.67..13046.67 rows=851267 width=4) (actual time=317.488..317.488 rows=851267 loops=1)
-> Seq Scan on my_version mv (cost=0.00..13046.67 rows=851267 width=4) (actual time=2.888..115.166 rows=8512
67 loops=1)
Total runtime: 63680.162 ms

Here is the "good" query, which is run directly on the data tables.

em=> explain analyze
select registry.version.version_id, registry.version.isosmiles
from hitlist_rows_reset_140
left join registry.version on (hitlist_rows_reset_140.objectid = registry.version.version_id)
where hitlist_rows_reset_140.sortorder >= 1
and hitlist_rows_reset_140.sortorder <= 10
order by hitlist_rows_reset_140.SortOrder;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
------------------------------
Nested Loop Left Join (cost=0.00..17.73 rows=1 width=54) (actual time=36.022..55.558 rows=10 loops=1)
-> Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140 (cost=0.00..8.36 rows=1 width=8) (actual time=
0.021..0.025 rows=10 loops=1)
Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
-> Index Scan using version_pkey on version (cost=0.00..9.35 rows=1 width=50) (actual time=5.551..5.552 rows=1 loops=10)
Index Cond: (hitlist_rows_reset_140.objectid = version.version_id)
Total runtime: 55.608 ms
(6 rows)

The view is defined like this:

em=> \d my_version
Table "test_schema.my_version"
Column | Type | Modifiers
------------+---------+-----------
version_id | integer | not null
Indexes:
"my_version_pkey" PRIMARY KEY, btree (version_id)

em=> \d version
View "test_schema.version"
Column | Type | Modifiers
------------+---------+-----------
version_id | integer |
parent_id | integer |
isosmiles | text |
coord_2d | text |
View definition:
SELECT v.version_id, v.parent_id, v.isosmiles, v.coord_2d
FROM registry.version v
JOIN my_version mv ON mv.version_id = v.version_id;

This is:

Postgres 8.4.4
Ubuntu Linux 2.6.32-27
Database: 8x7200 RAID 10, LSI RAID controller with BBU
WAL: 2x7200 RAID1

Non-default config parameters:

max_connections = 500
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB
track_activities = on
track_counts = off
track_functions = none
escape_string_warning = off

Thanks,
Craig

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andreas Kretschmer 2011-02-28 18:32:46 Re: optimalization
Previous Message Selva manickaraja 2011-02-28 16:25:53 Load and Stress on PostgreSQL 9.0