Bad query plan when the wrong data type is used

From: Laszlo Nagy <gandalf(at)shopzeus(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Cc: Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Bad query plan when the wrong data type is used
Date: 2011-02-08 14:15:27
Message-ID: 4D514FFF.9060308@shopzeus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id =
visa.variation_item_id
where visa.id =4

runs in 43 msec. The "visa.id" column has int4 datatype. The query plan
uses an index condition:

"Nested Loop (cost=0.00..26.19 rows=1 width=28)"
" -> Nested Loop (cost=0.00..17.75 rows=1 width=24)"
" -> Index Scan using variation_item_sellingsite_asin_pkey on
variation_item_sellingsite_asin visa (cost=0.00..8.58 rows=1 width=16)"
" Index Cond: (id = 4)"
" -> Index Scan using pk_product_id on product p
(cost=0.00..9.16 rows=1 width=16)"
" Index Cond: (p.id = visa.product_id)"
" -> Index Scan using pk_variation_item_id on variation_item vi
(cost=0.00..8.43 rows=1 width=12)"
" Index Cond: (vi.id = visa.variation_item_id)"

This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id =
visa.variation_item_id
where visa.id =4.0

Runs for 1144 msec! Query plan uses seq scan + filter:

"Nested Loop (cost=33957.27..226162.68 rows=14374 width=28)"
" -> Hash Join (cost=33957.27..106190.76 rows=14374 width=20)"
" Hash Cond: (visa.variation_item_id = vi.id)"
" -> Seq Scan on variation_item_sellingsite_asin visa
(cost=0.00..71928.04 rows=14374 width=16)"
" Filter: ((id)::numeric = 4.0)"
" -> Hash (cost=22026.01..22026.01 rows=954501 width=12)"
" -> Seq Scan on variation_item vi (cost=0.00..22026.01
rows=954501 width=12)"
" -> Index Scan using pk_product_id on product p (cost=0.00..8.33
rows=1 width=16)"
" Index Cond: (p.id = visa.product_id)"

Which is silly. I think that PostgreSQL converts the int side to a
float, and then compares them.

It would be better to do this, for each item in the loop:

* evaluate the right side (which is float)
* tell if it is an integer or not
* if not an integer, then discard the row immediately
* otherwise use its integer value for the index scan

The result is identical, but it makes possible to use the index scan. Of
course, I know that the query itself is wrong, because I sould not use a
float where an int is expected. But this CAN be optimized, so I think it
should be! My idea for the query optimizer is not to use the "wider"
data type, but use the data type that has an index on it instead.

(I spent an hour figuring out what is wrong with my program. In some
cases it was slow, in other cases it was really fast, and I never got an
error message.)

What do you think?

Laszlo

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2011-02-08 14:23:02 Re: Really really slow select count(*)
Previous Message Pavel Stehule 2011-02-08 10:45:52 Re: compare languages