From: | Markus Schaber <schabios(at)logi-track(dot)com> |
---|---|
To: | PostgreSQL Performance List <pgsql-performance(at)postgresql(dot)org> |
Subject: | Multiple Uniques |
Date: | 2004-09-02 13:43:40 |
Message-ID: | 20040902154340.5b1a669a@kingfisher.intern.logi-track.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello,
Today, we stumbled about the following query plan on PostGreSQL 7.4.1:
logigis=# explain select count(id) from (select distinct id from (select distinct ref_in_id as id from streets union select distinct nref_in_id as id from streets) as blubb) as blabb;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=33246086.32..33246086.32 rows=1 width=8)
-> Subquery Scan blabb (cost=32970061.50..33246085.82 rows=200 width=8)
-> Unique (cost=32970061.50..33246083.82 rows=200 width=8)
-> Sort (cost=32970061.50..33108072.66 rows=55204464 width=8)
Sort Key: id
-> Subquery Scan blubb (cost=23697404.79..24525471.75 rows=55204464 width=8)
-> Unique (cost=23697404.79..23973427.11 rows=55204464 width=8)
-> Sort (cost=23697404.79..23835415.95 rows=55204464 width=8)
Sort Key: id
-> Append (cost=7212374.04..15252815.03 rows=55204464 width=8)
-> Subquery Scan "*SELECT* 1" (cost=7212374.04..7626407.52 rows=27602232 width=8)
-> Unique (cost=7212374.04..7350385.20 rows=27602232 width=8)
-> Sort (cost=7212374.04..7281379.62 rows=27602232 width=8)
Sort Key: ref_in_id
-> Seq Scan on streets (cost=0.00..3129090.32 rows=27602232 width=8)
-> Subquery Scan "*SELECT* 2" (cost=7212374.04..7626407.52 rows=27602232 width=8)
-> Unique (cost=7212374.04..7350385.20 rows=27602232 width=8)
-> Sort (cost=7212374.04..7281379.62 rows=27602232 width=8)
Sort Key: nref_in_id
-> Seq Scan on streets (cost=0.00..3129090.32 rows=27602232 width=8)
(20 rows)
I might have to add that this query is not an every-day query, it's
merely part of some statistics that a colleague needs for some
estimations as he has to create a tool that works on this data.
Both ref_in_id and nref_in_id are non-indexed columns with type int8.
I was somehow irritated by the fact that the Query Plan contains 4 Uniques.
Then, I tried the following query:
logigis=# explain select count(id) from (select distinct id from (select ref_in_id as id from streets union select nref_in_id as id from streets) as blubb) as blabb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=24803496.57..24803496.57 rows=1 width=8)
-> Subquery Scan blabb (cost=24527471.75..24803496.07 rows=200 width=8)
-> Unique (cost=24527471.75..24803494.07 rows=200 width=8)
-> Sort (cost=24527471.75..24665482.91 rows=55204464 width=8)
Sort Key: id
-> Subquery Scan blubb (cost=15254815.03..16082881.99 rows=55204464 width=8)
-> Unique (cost=15254815.03..15530837.35 rows=55204464 width=8)
-> Sort (cost=15254815.03..15392826.19 rows=55204464 width=8)
Sort Key: id
-> Append (cost=0.00..6810225.28 rows=55204464 width=8)
-> Subquery Scan "*SELECT* 1" (cost=0.00..3405112.64 rows=27602232 width=8)
-> Seq Scan on streets (cost=0.00..3129090.32 rows=27602232 width=8)
-> Subquery Scan "*SELECT* 2" (cost=0.00..3405112.64 rows=27602232 width=8)
-> Seq Scan on streets (cost=0.00..3129090.32 rows=27602232 width=8)
(14 rows)
And after re-parsing the documentation about UNION, the following one:
logigis=# explain select count(id) from (select ref_in_id as id from streets union select nref_in_id as id from streets) as blubb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Aggregate (cost=16220893.16..16220893.16 rows=1 width=8)
-> Subquery Scan blubb (cost=15254815.03..16082881.99 rows=55204464 width=8)
-> Unique (cost=15254815.03..15530837.35 rows=55204464 width=8)
-> Sort (cost=15254815.03..15392826.19 rows=55204464 width=8)
Sort Key: id
-> Append (cost=0.00..6810225.28 rows=55204464 width=8)
-> Subquery Scan "*SELECT* 1" (cost=0.00..3405112.64 rows=27602232 width=8)
-> Seq Scan on streets (cost=0.00..3129090.32 rows=27602232 width=8)
-> Subquery Scan "*SELECT* 2" (cost=0.00..3405112.64 rows=27602232 width=8)
-> Seq Scan on streets (cost=0.00..3129090.32 rows=27602232 width=8)
(10 rows)
Those queries should give the same result.
So, now my question is, why does the query optimizer not recognize that
it can throw away those "non-unique" Sort/Unique passes?
Is PostGreSQL 8 capable of this optimization?
Thanks,
Markus
--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios(at)logi-track(dot)com | www.logi-track.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-09-02 14:22:37 | Re: Multiple Uniques |
Previous Message | Raymond O'Donnell | 2004-09-02 10:46:02 | Re: odbc/ado problems |