8.4.7, incorrect estimate

From: Wayne Conrad <wayne(at)databill(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: 8.4.7, incorrect estimate
Date: 2011-04-29 18:24:11
Message-ID: 4DBB024B.5050500@databill.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Howdy. We've got a query that takes less than a second unless we add a
"order by" to it, after which it takes 40 seconds. Here's the query:

select page_number, ps_id, ps_page_id from ps_page where ps_page_id in
(select ps_page_id from documents_ps_page where document_id in (select
document_id from temp_doc_ids)) order by ps_page_id;

The parts of the schema used in this query:

Table "public.ps_page"
Column | Type | Modifiers

-------------+---------+--------------------------------------------------------------
ps_page_id | integer | not null default
nextval('ps_page_ps_page_id_seq'::regclass)
ps_id | integer | not null
page_number | integer | not null
Indexes:
"ps_page_pkey" PRIMARY KEY, btree (ps_page_id)
"ps_page_ps_id_key" UNIQUE, btree (ps_id, page_number)

Table "public.documents_ps_page"
Column | Type | Modifiers
-------------+---------+-----------
document_id | text | not null
ps_page_id | integer | not null
Indexes:
"documents_ps_page_pkey" PRIMARY KEY, btree (document_id, ps_page_id)
"documents_ps_page_ps_page_id_idx" btree (ps_page_id)

temp_doc_ids (temporary table):
document_id text not null

The query with the "order by" (slow):

explain analyze select page_number, ps_id, ps_page_id from ps_page where
ps_page_id in (select ps_page_id from documents_ps_page where
document_id in (select document_id from temp_document_ids)) order by
ps_page_id
Merge Semi Join (cost=212570.02..3164648.31 rows=34398932
width=12) (actual time=54749.281..54749.295 rows=5 loops=1)
Merge Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id)
-> Index Scan using ps_page_pkey on ps_page
(cost=0.00..2999686.03 rows=86083592 width=12) (actual
time=0.029..36659.393 rows=85591467 loops=1)
-> Sort (cost=18139.39..18152.52 rows=6255 width=4) (actual
time=0.080..0.083 rows=5 loops=1)
Sort Key: documents_ps_page.ps_page_id
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=26.23..17808.09 rows=6255 width=4)
(actual time=0.044..0.073 rows=5 loops=1)
-> HashAggregate (cost=26.23..27.83 rows=200
width=32) (actual time=0.015..0.017 rows=5 loops=1)
-> Seq Scan on temp_document_ids
(cost=0.00..23.48 rows=1310 width=32) (actual time=0.004..0.007 rows=5
loops=1)
-> Index Scan using documents_ps_page_pkey on
documents_ps_page (cost=0.00..88.59 rows=31 width=42) (actual
time=0.009..0.010 rows=1 loops=5)
Index Cond: (documents_ps_page.document_id =
(temp_document_ids.document_id)::text)
Total runtime: 54753.028 ms

The query without the "order by" (fast):

production=> explain analyze select page_number, ps_id, ps_page_id from
ps_page where ps_page_id in (select ps_page_id from documents_ps_page
where document_id in (select document_id from temp_doc_ids));

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=17821.42..87598.71 rows=34398932 width=12) (actual
time=0.099..0.136 rows=5 loops=1)
-> HashAggregate (cost=17821.42..17871.46 rows=6255 width=4)
(actual time=0.083..0.096 rows=5 loops=1)
-> Nested Loop (cost=26.23..17808.28 rows=6255 width=4)
(actual time=0.047..0.076 rows=5 loops=1)
-> HashAggregate (cost=26.23..27.83 rows=200 width=32)
(actual time=0.014..0.015 rows=5 loops=1)
-> Seq Scan on temp_doc_ids (cost=0.00..23.48
rows=1310 width=32) (actual time=0.005..0.005 rows=5 loops=1)
-> Index Scan using documents_ps_page_pkey on
documents_ps_page (cost=0.00..88.59 rows=31 width=42) (actual
time=0.010..0.010 rows=1 loops=5)
Index Cond: (documents_ps_page.document_id =
temp_doc_ids.document_id)
-> Index Scan using ps_page_pkey on ps_page (cost=0.00..11.14
rows=1 width=12) (actual time=0.007..0.007 rows=1 loops=5)
Index Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id)
Total runtime: 0.213 ms
(10 rows)

We notice that in all cases, the plans contain some estimated row counts
that differ quite a bit from the actual row counts. We tried increasing
(from 100 to 1,000 and 10,000) the statistics targets for each of the
indexed columns, one at a time, and analyzing the table/column with each
change. This had no effect.

Postgres version 8.4.7 on AMD64, Debian Linux "wheezy" (aka "testing").

Where should we look next?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-04-29 18:55:49 Re: Performance
Previous Message gnuoytr 2011-04-29 16:52:14 Re: FUSION-IO io cards