From: | "Jussi Pakkanen" <jpakkane(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #4462: Adding COUNT to query causes massive slowdown |
Date: | 2008-10-09 09:14:45 |
Message-ID: | 200810090914.m999EjOJ073923@wwwmaster.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged online:
Bug reference: 4462
Logged by: Jussi Pakkanen
Email address: jpakkane(at)gmail(dot)com
PostgreSQL version: 8.3.3
Operating system: Ubuntu x86 8/04
Description: Adding COUNT to query causes massive slowdown
Details:
I have a table in the following format
code CHARACTER(9) NOT NULL
text VARCHAR(200)
I have built an INDEX on "code", VACUUMed and ANALYZEd the table.
I have about 32 million rows and roughly 200 000 unique "code" elements.
I determine the unique codes using the following SQL query:
EXPLAIN SELECT DISTINCT code FROM log;
QUERY PLAN
----------------------------------------------------------------------------
-----------
Unique (cost=0.00..1384173.89 rows=6393 width=10)
-> Index Scan using codeindex on log (cost=0.00..1303930.83
rows=32097224 width=10)
(2 rows)
This takes about 4 minutes (it's a slow machine) but pretty much works as
expected.
However when I try to count the amount of distinct codes, I get this:
EXPLAIN SELECT COUNT(DISTINCT code) FROM log;
QUERY PLAN
----------------------------------------------------------------------------
-----
Aggregate (cost=100801488.30..100801488.31 rows=1 width=10)
-> Seq Scan on log (cost=100000000.00..100721245.24 rows=32097224
width=10)
(2 rows)
For some reason PostgreSQL wants to do a full table scan in this case. This
takes over 11 minutes.
Transferring the result set from the first query to a Python client program
and calculating the lines there takes about 4 seconds. This makes pg over
100 times slower than the naive implementation.
If I do the same COUNT using a view, it uses the index and is fast:
CREATE VIEW distcode AS SELECT DISTINCT code FROM log;
EXPLAIN SELECT COUNT(*) FROM distcode;
QUERY PLAN
----------------------------------------------------------------------------
-----------------
Aggregate (cost=1384253.81..1384253.82 rows=1 width=0)
-> Unique (cost=0.00..1384173.89 rows=6393 width=10)
-> Index Scan using codeindex on log (cost=0.00..1303930.83
rows=320972
I tried setting seq_scan to off. It did not help.
Due to reasons beyond my control, I can't test version 8.3.4 until the next
Ubuntu is released (at the end of this month).
From | Date | Subject | |
---|---|---|---|
Next Message | ivocs admin | 2008-10-09 09:37:00 | Locale (unsupported) bug. uk_UA.KOI8-U |
Previous Message | Dmitry Orlov | 2008-10-09 06:17:10 | BUG #4461: Should avoid create tblspcs in system location |