Unexpected seq scans when expected result is 1 row out of milions

From: Svetlin Manavski <svetlin(dot)manavski(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Unexpected seq scans when expected result is 1 row out of milions
Date: 2011-07-15 12:57:18
Message-ID: CAMm+ggR=-+pqw9Qo7uQ6dx9pa0bUfBEXz6z-+UfUNGvRuJuBXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,
here is my postgresql configuration:

"version";"PostgreSQL 9.0.3 on amd64-portbld-freebsd8.0, compiled by GCC cc
(GCC) 4.2.1 20070719 [FreeBSD], 64-bit"
"bytea_output";"escape"
"checkpoint_segments";"64"
"client_encoding";"UNICODE"
"effective_cache_size";"6GB"
"fsync";"off"
"lc_collate";"C"
"lc_ctype";"C"
"listen_addresses";"*"
"log_destination";"syslog"
"max_connections";"20"
"max_stack_depth";"2MB"
"server_encoding";"UTF8"
"shared_buffers";"4GB"
"silent_mode";"on"
"synchronous_commit";"off"
"TimeZone";"Europe/Jersey"
"update_process_title";"off"
"work_mem";"24MB"

I have a partitioned table tcpsessiondata:

CREATE TABLE appqosdata.tcpsessiondata
(
detectorid smallint not null,
createdtime bigint not null,
sessionid bigint not null,
...
);

That table has many millions of rows in each partition and no data in the
main one. Here is an example:

select count(*) from appqosdata.tcpsessiondata_1;
count
----------
49377910
(1 row)

Every partition has a "Primary Key (detectorid, createdtime)"

I run the following query on that table:
select
cast (SD.detectorid as numeric),
CAST( (createdtime / 61000000000::bigint) AS numeric) as timegroup,
sum(datafromsource)+sum(datafromdestination) as numbytes,
CAST ( sum(packetsfromsource)+sum(packetsfromdestination) AS numeric) as
numpackets
from
appqosdata.tcpsessiondata SD
where
SD.detectorid >= 0 and SD.createdtime >= 1297266601368086000::bigint
and SD.createdtime < 1297270202368086000::bigint
group by SD.detectorid, timegroup

The table is partitioned by a "sessionid" which is not used in this
particular query so I already expect all partitions to be touched. However I
have a bad scan choice on at least a couple of partitions:

"HashAggregate (cost=5679026.42..5679028.76 rows=67 width=34) (actual
time=160113.366..160113.366 rows=0 loops=1)"
" Output: (sd.detectorid)::numeric, (((sd.createdtime /
61000000000::bigint))::numeric), (sum(sd.datafromsource) +
sum(sd.datafromdestination)), ((sum(sd.packetsfromsource) +
sum(sd.packetsfromdestination)))::numeric, sd.detectorid"
" -> Result (cost=0.00..5679025.41 rows=67 width=34) (actual
time=160113.360..160113.360 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime, sd.datafromsource,
sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination,
((sd.createdtime / 61000000000::bigint))::numeric"
" -> Append (cost=0.00..5679025.08 rows=67 width=34) (actual
time=160113.356..160113.356 rows=0 loops=1)"
" -> Seq Scan on appqosdata.tcpsessiondata sd
(cost=0.00..23.65 rows=1 width=34) (actual time=0.002..0.002 rows=0
loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >=
1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Seq Scan on appqosdata.tcpsessiondata_1 sd
(cost=0.00..1373197.46 rows=1 width=34) (actual time=46436.737..46436.737
rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >=
1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Seq Scan on appqosdata.tcpsessiondata_2 sd
(cost=0.00..2447484.00 rows=1 width=34) (actual time=108359.967..108359.967
rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >=
1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_3_pkey on
appqosdata.tcpsessiondata_3 sd (cost=0.00..11.51 rows=1 width=34) (actual
time=0.016..0.016 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime
>= 1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"

... (many more partitions here)....

" -> Index Scan using tcpsessiondata_61_pkey on
appqosdata.tcpsessiondata_61 sd (cost=0.00..8162.42 rows=1 width=34)
(actual time=25.446..25.446 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime
>= 1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_62_pkey on
appqosdata.tcpsessiondata_62 sd (cost=0.00..11.51 rows=1 width=34) (actual
time=0.008..0.008 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime
>= 1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_63_pkey on
appqosdata.tcpsessiondata_63 sd (cost=0.00..11.51 rows=1 width=34) (actual
time=0.006..0.006 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime
>= 1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Seq Scan on appqosdata.tcpsessiondata_64 sd
(cost=0.00..13.00 rows=1 width=34) (actual time=0.102..0.102 rows=0
loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >=
1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Seq Scan on appqosdata.tcpsessiondata_65 sd
(cost=0.00..117.64 rows=1 width=34) (actual time=0.854..0.854 rows=0
loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Filter: ((sd.detectorid >= 0) AND (sd.createdtime >=
1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_66_pkey on
appqosdata.tcpsessiondata_66 sd (cost=0.00..11.51 rows=1 width=34) (actual
time=0.007..0.007 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime
>= 1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
" -> Index Scan using tcpsessiondata_67_pkey on
appqosdata.tcpsessiondata_67 sd (cost=0.00..11.51 rows=1 width=34) (actual
time=0.005..0.005 rows=0 loops=1)"
" Output: sd.detectorid, sd.createdtime,
sd.datafromsource, sd.datafromdestination, sd.packetsfromsource,
sd.packetsfromdestination"
" Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime
>= 1297266601368086000::bigint) AND (sd.createdtime <
1297270202368086000::bigint))"
"Total runtime: 160114.339 ms"

The question is: why do we get a seq scan on appqosdata.tcpsessiondata_1 and
appqosdata.tcpsessiondata_2 even if the planner estimates correctly 1 row
out of millions could potentially be selected? As you can see ~90% of the
time is spent on those 2 partitions even if they are not apparently
different from any of the others.

I would appreciate any help with this issue.

Thank you,
Svetlin Manavski

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Schnabel 2011-07-15 16:39:49 Re: Hardware advice for scalable warehouse db
Previous Message jesper 2011-07-15 08:22:59 Re: Hardware advice for scalable warehouse db