Seq scan over 3.3 millions of rows instead of using date and pattern indexes

From: "Andrus" <kobruleht2(at)hot(dot)ee>
To: pgsql-performance(at)postgresql(dot)org
Subject: Seq scan over 3.3 millions of rows instead of using date and pattern indexes
Date: 2008-11-30 20:17:34
Message-ID: ggusgk$1i7o$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

explain analyze SELECT sum(1)
FROM dok JOIN rid USING (dokumnr)
WHERE dok.kuupaev>='2008-05-01'
and
( (
dok.doktyyp IN
('V','G','Y','K','I','T','D','N','H','M','E','B','A','R','C','F','J','Q')
AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus
ELSE rid.kuluobjekt END LIKE 'AEGVIIDU%'
)
OR
( dok.doktyyp IN ('O','S','I','U','D','P')
AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus
END LIKE 'AEGVIIDU%'
)
)

"Aggregate (cost=369240.67..369240.68 rows=1 width=0) (actual
time=41135.557..41135.560 rows=1 loops=1)"
" -> Hash Join (cost=96614.24..369229.39 rows=4508 width=0) (actual
time=5859.704..40912.979 rows=59390 loops=1)"
" Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
" Join Filter: (((("inner".doktyyp = 'V'::bpchar) OR ("inner".doktyyp
= 'G'::bpchar) OR ("inner".doktyyp = 'Y'::bpchar) OR ("inner".doktyyp =
'K'::bpchar) OR ("inner".doktyyp = 'I'::bpchar) OR ("inner".doktyyp =
'T'::bpchar) OR ("inner".doktyyp = 'D'::bpchar) OR ("inner".doktyyp =
'N'::bpchar) OR ("inner".doktyyp = 'H'::bpchar) OR ("inner".doktyyp =
'M'::bpchar) OR ("inner".doktyyp = 'E'::bpchar) OR ("inner".doktyyp =
'B'::bpchar) OR ("inner".doktyyp = 'A'::bpchar) OR ("inner".doktyyp =
'R'::bpchar) OR ("inner".doktyyp = 'C'::bpchar) OR ("inner".doktyyp =
'F'::bpchar) OR ("inner".doktyyp = 'J'::bpchar) OR ("inner".doktyyp =
'Q'::bpchar)) AND (CASE WHEN ((NOT ("inner".objrealt)::boolean) OR
("inner".doktyyp = 'I'::bpchar)) THEN "inner".yksus ELSE "outer".kuluobjekt
END ~~ 'AEGVIIDU%'::text)) OR ((("inner".doktyyp = 'O'::bpchar) OR
("inner".doktyyp = 'S'::bpchar) OR ("inner".doktyyp = 'I'::bpchar) OR
("inner".doktyyp = 'U'::bpchar) OR ("inner".doktyyp = 'D'::bpchar) OR
("inner".doktyyp = 'P'::bpchar)) AND (CASE WHEN ("inner".objrealt)::boolean
THEN "outer".kuluobjekt ELSE "inner".sihtyksus END ~~ 'AEGVIIDU%'::text)))"
" -> Seq Scan on rid (cost=0.00..129911.53 rows=3299853 width=18)
(actual time=0.039..17277.888 rows=3299777 loops=1)"
" -> Hash (cost=92983.97..92983.97 rows=336110 width=38) (actual
time=3965.478..3965.478 rows=337455 loops=1)"
" -> Bitmap Heap Scan on dok (cost=1993.66..92983.97
rows=336110 width=38) (actual time=135.810..2389.703 rows=337455 loops=1)"
" Recheck Cond: (kuupaev >= '2008-05-01'::date)"
" Filter: ((doktyyp = 'V'::bpchar) OR (doktyyp =
'G'::bpchar) OR (doktyyp = 'Y'::bpchar) OR (doktyyp = 'K'::bpchar) OR
(doktyyp = 'I'::bpchar) OR (doktyyp = 'T'::bpchar) OR (doktyyp =
'D'::bpchar) OR (doktyyp = 'N'::bpchar) OR (doktyyp = 'H'::bpchar) OR
(doktyyp = 'M'::bpchar) OR (doktyyp = 'E'::bpchar) OR (doktyyp =
'B'::bpchar) OR (doktyyp = 'A'::bpchar) OR (doktyyp = 'R'::bpchar) OR
(doktyyp = 'C'::bpchar) OR (doktyyp = 'F'::bpchar) OR (doktyyp =
'J'::bpchar) OR (doktyyp = 'Q'::bpchar) OR (doktyyp = 'O'::bpchar) OR
(doktyyp = 'S'::bpchar) OR (doktyyp = 'I'::bpchar) OR (doktyyp =
'U'::bpchar) OR (doktyyp = 'D'::bpchar) OR (doktyyp = 'P'::bpchar))"
" -> Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1993.66 rows=347618 width=0) (actual time=97.881..97.881
rows=337770 loops=1)"
" Index Cond: (kuupaev >= '2008-05-01'::date)"
"Total runtime: 41136.348 ms"

8.1.4 Db is analyzed, default_statistics_target is 40.
PostgreSql still choices seq scan over rid.

This query can optimized as follows:

1. kuupaev >= '2008-05-01' index can reduce number of scanned rows 10 times
(to 330000)
2. AEGVIIDU% can reduce number of rows 6 times (to 60000)

How to force pg to use indexes for those conditions ?

This query can be executed against different shops groups (int this case
there is other value than AEGVIIDU) and for different date.

There are 6 different shop groups containing roughly same number or records
each.
So using index on AEGVIIDU% can decrease number of scanned rows 6 times.
Usually 90% of dok records contain 'Y' in dok.doktyyp column and
dok.objrealt is false for those records.

Is it possible to use come functional index or other method to speed it ?

rid.kuluobjekt, dok.yksus and dok.sihtyksus types are char(10).

There are indexes

dok(yksus bpchar_pattern_ops)
dok(sihtyksus bpchar_pattern_ops)

Is it possible to re-write query that it uses those indexes or create some
other indexes?
Using 8.1.4, us-en locale, utf-8 db encoding.
select column list is removed from sample.

Andrus.

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2008-12-01 13:24:38 Re: Query optimization
Previous Message Marc Cousin 2008-11-30 19:37:39 Re: Query optimization