functional index not used, looping simpler query just faster

From: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To: PGSQL <pgsql-general(at)postgresql(dot)org>
Subject: functional index not used, looping simpler query just faster
Date: 2008-07-10 09:40:40
Message-ID: 20080710114040.33f8c07c@dawn.webthatworks.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've this:

CREATE TABLE catalog_brands
(
brandid serial NOT NULL,
"name" character varying(64) NOT NULL,
delivery smallint NOT NULL DEFAULT (24 * 15),
deliverymessage character varying(64),
brandtypeid integer,
brandgroupid integer,
CONSTRAINT catalog_brands_pkey PRIMARY KEY (brandid),
CONSTRAINT catalog_brands_brandgroupid_fkey FOREIGN KEY
(brandgroupid) REFERENCES catalog_brandgroup (brandgroupid) MATCH
SIMPLE ON UPDATE NO ACTION ON DELETE SET NULL,
CONSTRAINT catalog_brands_brandtypeid_fkey FOREIGN KEY
(brandtypeid) REFERENCES catalog_brandtype (brandtypeid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE SET NULL
);

CREATE INDEX catalog_brands_name_index
ON catalog_brands
USING btree
(upper(name::text));

CREATE TABLE catalog_items
(
itemid bigint NOT NULL,
brand integer NOT NULL,
name character varying(256) NOT NULL,
/* snip */
datainserimento timestamp without time zone,
dapub smallint,
CONSTRAINT catalog_items_statusid_fkey FOREIGN KEY (statusid)
REFERENCES catalog_item_status (statusid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE INDEX catalog_items_brands_index
ON catalog_items
USING btree
(upper(brands::text));

CREATE UNIQUE INDEX catalog_items_itemsid_index
ON catalog_items
USING btree
(itemid);
ALTER TABLE catalog_items CLUSTER ON catalog_items_itemsid_index;

catalog_items contains ~ 650K records
catalog_brands 44 records

Now I try this:

explain select i1.brands, i1.name, i1.dataPub, i1.datainserimento
from catalog_items i1
inner join catalog_brands b1 on upper(i1.brands)=upper(b1.name)
where i1.ItemID in (
select i2.ItemID from catalog_items i2
inner join catalog_brands b2 on upper(i2.brands)=upper(b2.name)
where i1.brands=i2.brands
and i2.dataPub>(now() - interval '8 month') and
i2.datainserimento>(now() - interval '6 month') order by
i2.datainserimento desc limit 3);

And I got this:

"Nested Loop (cost=0.00..6383568361.87 rows=74378 width=82)"
" -> Seq Scan on catalog_brands b1 (cost=0.00..1.44 rows=44 width=18)"
" -> Index Scan using catalog_items_brands_index on catalog_items i1 (cost=0.00..145081069.53 rows=1690 width=82)"
" Index Cond: (upper((i1.brands)::text) = upper(("outer".name)::text))"
" Filter: (subplan)"
" SubPlan"
" -> Limit (cost=42906.81..42906.82 rows=1 width=16)"
" -> Sort (cost=42906.81..42906.82 rows=1 width=16)"
" Sort Key: i2.datainserimento"
" -> Nested Loop (cost=0.00..42906.80 rows=1 width=16)"
" Join Filter: (upper(("outer".brands)::text) = upper(("inner".name)::text))"
" -> Seq Scan on catalog_items i2 (cost=0.00..42904.59 rows=1 width=34)"
" Filter: ((($0)::text = (brands)::text) AND (datapub > (now() - '8 mons'::interval)) AND (datainserimento > (now() - '6 mons'::interval)))"
" -> Seq Scan on catalog_brands b2 (cost=0.00..1.44 rows=44 width=18)"

I never waited enough to see results from the above... several
minutes over a 2xXeon 4Gb ram.

A simpler
select name, brands from catalog_items where
upper(brands)=upper('LARGEST GROUP') order by datainserimento desc
limit 3;

finishes in few seconds. Iterating over 44 groups does look to be
much faster than the more complicated query.

"Limit (cost=9503.62..9503.63 rows=3 width=74)"
" -> Sort (cost=9503.62..9512.08 rows=3381 width=74)"
" Sort Key: datainserimento"
" -> Bitmap Heap Scan on catalog_items (cost=29.84..9305.44 rows=3381 width=74)"
" Recheck Cond: (upper((brands)::text) = 'CAMBRIDGE UNIVERSITY PRESS'::text)"
" -> Bitmap Index Scan on catalog_items_brands_index (cost=0.00..29.84 rows=3381 width=0)"
" Index Cond: (upper((brands)::text) = 'CAMBRIDGE UNIVERSITY PRESS'::text)"

Even
select count(*), i1.brands from catalog_items i1
inner join catalog_brands b1 on
upper(b1.name)=upper(i1.brands)
group by i1.brands order by count(*)

takes from few seconds to less than 1 sec.

I could actually loop inside plpgsql but... well I'd like to
understand how things work.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2008-07-10 09:50:01 Re: functional index not used, looping simpler query just faster
Previous Message Leif B. Kristensen 2008-07-10 09:08:46 Re: User-Defined Variables