Re: TPC-R benchmarks

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Oleg Lebedev <oleg(dot)lebedev(at)waterford(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: TPC-R benchmarks
Date: 2003-10-07 23:59:17
Message-ID: 200310071659.17863.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom,

I've found the problem with TPC-R query #19. And it, unfortunately, appears
to be a problem in the PostgreSQL query planner.

To sum up the below: it appears that whenever a set of WHERE conditions
exceeds a certain level of complexity, the planner just ignores all
applicable indexes and goes for a seq scan. While this may be unavoidable
to some degree, it seems to me that we need to raise the threshold of
complexity at which it does this.

tpcr=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3 20030226
(prerelease) (SuSE Linux)
(1 row)

I've tested a number of indexes on the query, and found the two most efficient
on subsets of the query. Thus:

explain analyze
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#33'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 8 and l_quantity <= 8 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=10380.70..10380.70 rows=1 width=30) (actual
time=161.61..161.61 rows=1 loops=1)
-> Nested Loop (cost=0.00..10380.67 rows=13 width=30) (actual
time=81.54..161.47 rows=17 loops=1)
-> Index Scan using idx_part_1 on part (cost=0.00..9466.33 rows=62
width=4) (actual time=81.21..137.24 rows=98 loops=1)
Index Cond: (p_brand = 'Brand#33'::bpchar)
Filter: (((p_container = 'SM CASE'::bpchar) OR (p_container =
'SM BOX'::bpchar) OR (p_container = 'SM PACK'::bpchar) OR (p_container = 'SM
PKG'::bpchar)) AND (p_size >= 1) AND (p_size <= 5))
-> Index Scan using idx_lineitem_3 on lineitem (cost=0.00..14.84
rows=1 width=26) (actual time=0.22..0.24 rows=0 loops=98)
Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND
(lineitem.l_quantity >= 8::numeric) AND (lineitem.l_quantity <= 18::numeric))
Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
Total runtime: 161.71 msec

This also works for a similar query:

explain analyze
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#52'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 14 and l_quantity <= 14 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);

Aggregate (cost=11449.36..11449.36 rows=1 width=30) (actual
time=195.72..195.72 rows=1 loops=1)
-> Nested Loop (cost=0.00..11449.29 rows=28 width=30) (actual
time=56.42..195.39 rows=48 loops=1)
-> Index Scan using idx_part_1 on part (cost=0.00..9466.33 rows=139
width=4) (actual time=56.15..153.17 rows=166 loops=1)
Index Cond: (p_brand = 'Brand#52'::bpchar)
Filter: (((p_container = 'MED BAG'::bpchar) OR (p_container =
'MED BOX'::bpchar) OR (p_container = 'MED PKG'::bpchar) OR (p_container =
'MED PACK'::bpchar)) AND (p_size >= 1) AND (p_size <= 10))
-> Index Scan using idx_lineitem_3 on lineitem (cost=0.00..14.29
rows=1 width=26) (actual time=0.23..0.25 rows=0 loops=166)
Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND
(lineitem.l_quantity >= 14::numeric) AND (lineitem.l_quantity <=
24::numeric))
Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
Total runtime: 195.82 msec
(9 rows)

If, however, I combine the two where clauses with an OR, the planner gets
confused and insists on loading the entire tables into memory (even though I
don't have that much memory):

explain
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#33'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 8 and l_quantity <= 8 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#52'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 14 and l_quantity <= 14 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);

Aggregate (cost=488301096525.25..488301096525.25 rows=1 width=146)
-> Nested Loop (cost=0.00..488301096525.15 rows=42 width=146)
Join Filter: ((("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'SM CASE'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM
CASE'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric)
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'SM BOX'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM
BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric)
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'SM PACK'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM
PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric)
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'SM PKG'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM
PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric)
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'MED BAG'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED
BAG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >=
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >=
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'MED BOX'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED
BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >=
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >=
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'MED PKG'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED
PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >=
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >=
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND
("inner".p_container = 'MED PACK'::bpchar) AND ("inner".p_partkey =
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric)
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED
PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >=
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >=
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN
PERSON'::bpchar)))
-> Seq Scan on lineitem (cost=0.00..235620.15 rows=6001215
width=95)
-> Seq Scan on part (cost=0.00..7367.00 rows=200000 width=51)

You'll pardon me for not doing an "ANALYZE", but I didn't want to wait
overnight. Manually disabling Seqscan and Nestloop did nothing to affect
this query plan; neither did removing the aggregate.

Tommorrow I will test 7.4 Beta 4.

How can we fix this?

--
-Josh Berkus
Aglio Database Solutions
San Francisco

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-10-08 04:15:57 Re: TPC-R benchmarks
Previous Message Timothy D. Witham 2003-10-07 18:34:33 Re: TPC-R benchmarks