inaccurate stats on large tables

From: Kiran Mukhyala <mukhyala(dot)kiran(at)gene(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: inaccurate stats on large tables
Date: 2008-09-04 18:21:52
Message-ID: 1220552512.3853.320.camel@sausalito.gene.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I am running a select on a large table with two where
conditions.
Explain analyze shows that the estimated number of rows returned
(190760) is much more than the actual rows returned (58221),
which is probably the underlying cause for the poor performance
I am seeing.

Can someone please tell me how to improve the query planner
estimate? I did try vacuum analyze. Here are some details:

Explain plan:
unison(at)csb-test=> explain analyze select * from paliasorigin a
where
a.origin_id=20 and a.tax_id=9606;


QUERY PLAN
--------------------------------------------------------------------------
Bitmap Heap Scan on paliasorigin a (cost=4901.38..431029.54
rows=190760 width=118) (actual time=12.447..112.902 rows=58221
loops=1)
Recheck Cond: ((origin_id = 20) AND (tax_id = 9606))
-> Bitmap Index Scan on paliasorigin_search3_idx
(cost=0.00..4853.69 rows=190760 width=0) (actual
time=11.407..11.407
rows=58221 loops=1)
Index Cond: ((origin_id = 20) AND (tax_id = 9606))

Schema:
unison(at)csb-test=> \d+ paliasorigin
Column | Type |
Modifiers |
-----------+--------------------------+------------
palias_id | integer | not null
origin_id | integer | not null
alias | text | not null
descr | text |
tax_id | integer |
added | timestamp with time zone | not null default
timenow()
Indexes:
"palias_pkey" PRIMARY KEY, btree (palias_id)
"paliasorigin_alias_unique_in_origin_idx" UNIQUE, btree
(origin_id,
alias)
"paliasorigin_alias_casefold_idx" btree (upper(alias))
CLUSTER
"paliasorigin_alias_idx" btree (alias)
"paliasorigin_o_idx" btree (origin_id)
"paliasorigin_search1_idx" btree (palias_id, origin_id)
"paliasorigin_search3_idx" btree (origin_id, tax_id,
palias_id)
"paliasorigin_tax_id_idx" btree (tax_id)
Foreign-key constraints:
"origin_id_exists" FOREIGN KEY (origin_id) REFERENCES
origin(origin_id) ON UPDATE CASCADE ON DELETE CASCADE
Has OIDs: no


Number of rows:
unison(at)csb-test=> select count(*) from paliasorigin;
count
----------
37909009
(1 row)

Pg version:
unison(at)csb-test=> select version();
version
--------------------------------------------------------------------------------------------
PostgreSQL 8.3.3 on x86_64-unknown-linux-gnu, compiled by GCC
gcc (GCC)
4.1.0 (SUSE Linux)
(1 row)


Info from analyze verbose:
unison(at)csb-test=> analyze verbose paliasorigin;
INFO: analyzing "unison.paliasorigin"
INFO: "paliasorigin": scanned 300000 of 692947 pages,
containing
16409041 live rows and 0 dead rows; 300000 rows in sample,
37901986
estimated total rows
ANALYZE
Time: 21999.506 ms


Thank you,

-Kiran Mukhyala

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matt Smiley 2008-09-04 18:45:37 Re: limit clause breaks query planner?
Previous Message Scott Carey 2008-09-04 18:04:29 Re: limit clause breaks query planner?