Re: Inefficient query plan

From: Jann Röder <roederja(at)ethz(dot)ch>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Inefficient query plan
Date: 2010-08-23 13:19:20
Message-ID: i4tsgo$sm8$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for your help,
here is the information you requested:

Table information: A = Papers, B = PaperReferences

wos-db=> \d Papers
Table "public.papers"
Column | Type | Modifiers
------------------+-------------------------+-----------
itemid | character(15) | not null
t9id | integer |
artn | character varying |
doi | character varying |
pii | character varying |
unsp | character varying |
issueid | character(10) | not null
title | character varying(1500) | not null
titleenhancement | character varying(500) |
beginningpage | character varying(19) |
pagecount | integer | not null
documenttype | character(1) | not null
abstract | text |
Indexes:
"papers_pkey" PRIMARY KEY, btree (itemid)
"idx_papers_issueid" btree (issueid)
Foreign-key constraints:
"papers_issueid_fkey" FOREIGN KEY (issueid) REFERENCES
journals(issueid) ON DELETE CASCADE
Referenced by:
TABLE "authorkeywords" CONSTRAINT "authorkeywords_itemid_fkey"
FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE
TABLE "authors" CONSTRAINT "authors_itemid_fkey" FOREIGN KEY
(itemid) REFERENCES papers(itemid) ON DELETE CASCADE
TABLE "grantnumbers" CONSTRAINT "grantnumbers_itemid_fkey" FOREIGN
KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE
TABLE "keywordsplus" CONSTRAINT "keywordsplus_itemid_fkey" FOREIGN
KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE
TABLE "languages" CONSTRAINT "languages_itemid_fkey" FOREIGN KEY
(itemid) REFERENCES papers(itemid) ON DELETE CASCADE
TABLE "paperreferences" CONSTRAINT "paperreferences_fromitemid_fkey"
FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE

wos-db=> \d PaperReferences
Table "public.paperreferences"
Column | Type | Modifiers
--------------------+-----------------------+-----------
itemid | character varying(15) | not null
t9id | integer |
citedauthor | character varying(75) |
citedartn | character varying |
citeddoi | character varying |
citedpii | character varying |
citedunsp | character varying |
citedreferenceyear | integer |
citedtitle | character varying(20) | not null
citedvolume | character varying(4) |
citedpage | character varying(5) |
referenceindex | integer | not null
Indexes:
"paperreferences_pkey" PRIMARY KEY, btree (itemid, referenceindex)
Foreign-key constraints:
"paperreferences_fromitemid_fkey" FOREIGN KEY (itemid) REFERENCES
papers(itemid) ON DELETE CASCADE

I just noticed that PaperReferences uses character varying (15) and
Papers uses character(15). Stupid mistake of mine. Do you think this
might cause the bad query planning? I will alter the table to use
character(15) in both cases and see if that helps.

postgresql.conf:
max_connections = 20
shared_buffers = 256MB
work_mem = 10MB
maintenance_work_mem = 128MB
max_stack_depth = 4MB
synchronous_commit = off
wal_buffers = 1MB
checkpoint_segments = 10
effective_cache_size = 768MB
default_statistics_target = 200
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.simple'

The query I run:
SELECT p.ItemID FROM Papers AS p, PaperReferences AS r WHERE p.itemID =
r.ItemID AND p.issueID = 'A1983PW823'

Query plan with seqscan enabled:

"Hash Join (cost=512.71..17709356.53 rows=8283226 width=16)"
" Hash Cond: ((r.itemid)::bpchar = p.itemid)"
" -> Seq Scan on paperreferences r (cost=0.00..15110856.68
rows=670707968 width=16)"
" -> Hash (cost=500.29..500.29 rows=994 width=16)"
" -> Index Scan using idx_papers_issueid on papers p
(cost=0.00..500.29 rows=994 width=16)"
" Index Cond: (issueid = 'A1983PW823'::bpchar)"

Query plan with seqscan disbaled

"Hash Join (cost=10000000280.88..10017668625.22 rows=4233278 width=16)"
" Hash Cond: ((r.itemid)::bpchar = p.itemid)"
" -> Seq Scan on paperreferences r
(cost=10000000000.00..10015110856.68 rows=670707968 width=16)"
" -> Hash (cost=274.53..274.53 rows=508 width=16)"
" -> Index Scan using idx_papers_issueid on papers p
(cost=0.00..274.53 rows=508 width=16)"
" Index Cond: (issueid = 'A1983PW823'::bpchar)"

Do you need an EXPLAIN ANALYZE output? Since it takes so long I can't
easily post one right now. But maybe I can get one over night.

My Hardware is an iMac running OS X 10.6.4 with 1.5 GB RAM and a 2.1 GHz
(or so) core 2 Duo processor.

Jann

Am 23.08.10 14:08, schrieb Kevin Grittner:
> Jann Röder wrote:
> Am 23.08.10 12:18, schrieb Scott Marlowe:
>
>>> What happens if you try
>>>
>>> set enable_seqscan=off;
>>> (your query here)
>>>
>> Tried that already. The query plan is exactly the same.
>
> Exactly? Not even the cost shown for the seq scan changed?
>
> You are almost certainly omitting some crucial piece of information
> in your report. Please look over this page and post a more complete
> report. In particular, please show the results of \d for both tables
> (or of pg_dump -s -t 'tablename'), your complete postgresql.conf file
> stripped of comments, and a description of your hardware and OS.
>
> -Kevin
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Grzegorz Jaśkiewicz 2010-08-23 13:28:51 Re: Inefficient query plan
Previous Message Kevin Grittner 2010-08-23 12:10:30 Re: Inefficient query plan