Re: Performs WAY better with enable_seqscan = off

From: Brendan Duddridge <brendan(at)clickspace(dot)com>
To: Ragnar <gnari(at)hive(dot)is>
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performs WAY better with enable_seqscan = off
Date: 2006-05-21 20:01:14
Message-ID: BA21CBE9-E89E-4DB6-BA04-FEED87E9B4A0@clickspace.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> is there some reason for the complicated form of the
> join conditions in the subselect?

Yes, the simpler form query definitely works, but it's not always as
fast as the index version with the complicated join syntax. Although
even that query varies significantly with different category_id
values. Not sure why. Sometimes it finishes in 150 ms, other times it
takes over a second.

Here's the explain plan from your query:

explain analyze select ac.attribute_id, la.name, ac.sort_order from
attribute_category ac, localized_attribute la where ac.category_id =
1001402 and la.locale_id = 1000001 and ac.is_browsable = 'true' and
la.attribute_id = ac.attribute_id and exists (select 'x' from
product_attribute_value pav, category_product cp where pav.product_id
= cp.product_id and pav.attribute_id = ac.attribute_id and
pav.status_code is null and cp.category_id= '1001402' and
cp.is_visible = 'true') order by ac.sort_order, la.name asc;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------
Sort (cost=6343.18..6343.20 rows=7 width=34) (actual
time=2244.241..2244.242 rows=2 loops=1)
Sort Key: ac.sort_order, la.name
-> Nested Loop (cost=2.00..6343.08 rows=7 width=34) (actual
time=1831.970..2244.209 rows=2 loops=1)
-> Index Scan using attribute_category__category_id_fk_idx
on attribute_category ac (cost=0.00..6321.95 rows=7 width=8) (actual
time=1831.938..2244.142 rows=2 loops=1)
Index Cond: (category_id = 1001402)
Filter: (((is_browsable)::text = 'true'::text) AND
(subplan))
SubPlan
-> Nested Loop (cost=2.00..10458.04 rows=30
width=0) (actual time=320.572..320.572 rows=0 loops=7)
-> Index Scan using
product_attribute_value__attribute_id_fk_idx on
product_attribute_value pav (cost=0.00..2661.39 rows=2572 width=4)
(actual time=0.020..33.589 rows=18468 loops=7)
Index Cond: (attribute_id = $0)
Filter: (status_code IS NULL)
-> Bitmap Heap Scan on category_product cp
(cost=2.00..3.02 rows=1 width=4) (actual time=0.011..0.011 rows=0
loops=129274)
Recheck Cond: ("outer".product_id =
cp.product_id)
Filter: ((category_id = 1001402) AND
((is_visible)::text = 'true'::text))
-> Bitmap Index Scan on
x_category_product__product_id_fk_idx (cost=0.00..2.00 rows=1
width=0) (actual time=0.008..0.008 rows=1 loops=129274)
Index Cond: ("outer".product_id =
cp.product_id)
-> Bitmap Heap Scan on localized_attribute la
(cost=2.00..3.01 rows=1 width=30) (actual time=0.019..0.019 rows=1
loops=2)
Recheck Cond: (la.attribute_id = "outer".attribute_id)
Filter: (locale_id = 1000001)
-> Bitmap Index Scan on
localized_attribute__attribute_id_fk_idx (cost=0.00..2.00 rows=1
width=0) (actual time=0.015..0.015 rows=1 loops=2)
Index Cond: (la.attribute_id =
"outer".attribute_id)
Total runtime: 2244.542 ms

Here's the schema for the two tables involved with the sub-select:

\d category_product;
Table "public.category_product"
Column | Type | Modifiers
---------------------+------------------------+-----------
category_id | integer | not null
product_id | integer | not null
en_name_sort_order | integer |
fr_name_sort_order | integer |
merchant_sort_order | integer |
price_sort_order | integer |
merchant_count | integer |
is_active | character varying(5) |
product_is_active | character varying(5) |
product_status_code | character varying(32) |
product_name_en | character varying(512) |
product_name_fr | character varying(512) |
product_click_count | integer |
is_visible | character varying(5) |
is_pending_visible | character varying(5) |
min_price_cad | numeric(12,4) |
max_price_cad | numeric(12,4) |
Indexes:
"x_category_product_pk" PRIMARY KEY, btree (category_id,
product_id)
"category_product__cat_id_is_visible_idx" btree
(((category_id::text || '.'::text) || is_visible::text))
"category_product__cat_id_prod_is_act_status_idx" btree
(category_id, product_is_active, product_status_code)
"category_product__category_id_is_active_and_status_idx" btree
(category_id, product_is_active, product_status_code)
"category_product__is_active_idx" btree (is_active)
"category_product__lower_product_name_en_idx" btree (lower
(product_name_en::text))
"category_product__lower_product_name_fr_idx" btree (lower
(product_name_fr::text))
"category_product__merchant_sort_order_idx" btree
(merchant_sort_order)
"category_product__min_price_cad_idx" btree (min_price_cad)
"category_product__product_id_category_id_status_idx" btree
(product_id, category_id, product_is_active, product_status_code)
"x_category_product__category_id_fk_idx" btree (category_id)
CLUSTER
"x_category_product__product_id_fk_idx" btree (product_id)
Foreign-key constraints:
"x_category_product_category_fk" FOREIGN KEY (category_id)
REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED
"x_category_product_product_fk" FOREIGN KEY (product_id)
REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED

and

\d product_attribute_value
Table "public.product_attribute_value"
Column | Type | Modifiers
----------------------------+-----------------------+-----------
attribute_id | integer | not null
attribute_unit_id | integer |
attribute_value_id | integer |
boolean_value | character varying(5) |
decimal_value | numeric(30,10) |
product_attribute_value_id | integer | not null
product_id | integer | not null
product_reference_id | integer |
status_code | character varying(32) |
Indexes:
"product_attribute_value_pk" PRIMARY KEY, btree
(product_attribute_value_id)
"product_attribute_value__attribute_id_fk_idx" btree (attribute_id)
"product_attribute_value__attribute_unit_id_fk_idx" btree
(attribute_unit_id)
"product_attribute_value__attribute_value_id_fk_idx" btree
(attribute_value_id)
"product_attribute_value__normalized_value_idx" btree
(normalized_value(decimal_value, attribute_unit_id))
"product_attribute_value__prod_id_att_id_status_is_null_ids"
btree (((product_id::text || '.'::text) || attribute_id::text)) WHERE
status_code IS NULL
"product_attribute_value__prod_id_att_val_id_status_is_null_idx"
btree (((product_id::text || '.'::text) || attribute_value_id::text))
WHERE status_code IS NULL
"product_attribute_value__product_id_fk_idx" btree (product_id)
CLUSTER
"product_attribute_value__product_reference_id_fk_idx" btree
(product_reference_id)
Foreign-key constraints:
"product_attribute_value_attribute_fk" FOREIGN KEY
(attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE
INITIALLY DEFERRED
"product_attribute_value_attributeunit_fk" FOREIGN KEY
(attribute_unit_id) REFERENCES attribute_unit(attribute_unit_id)
DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_attributevalue_fk" FOREIGN KEY
(attribute_value_id) REFERENCES attribute_value(attribute_value_id)
DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_product_fk" FOREIGN KEY (product_id)
REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED
"product_attribute_value_productreference_fk" FOREIGN KEY
(product_reference_id) REFERENCES product(product_id) DEFERRABLE
INITIALLY DEFERRED

When the query planner uses the indexes with the concatenated values
and the where clause, the query can be sub-second response times (but
not always depending on the category_id value). By just doing a
regular join as you suggested, it's always slower. The trick is
getting Postgres to use the proper index all the time. And so far the
only way I can do that is by turning off sequential scans, but that's
something I didn't want to do because I don't know how it would
affect the performance of the rest of my application.

Just a note, I have random_page_cost set to 1 to try and get it to
favour index scans. The database machine has 8GB of RAM and I have
effective_cache_size set to 2/3 of that.

Thanks,

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB T2G 0V9

http://www.clickspace.com

On May 21, 2006, at 4:50 AM, Ragnar wrote:

> On sun, 2006-05-21 at 02:21 -0600, Brendan Duddridge wrote:
>> Hi,
>>
>>
>> I have a query that performs WAY better when I have enable_seqscan =
>> off:
>>
>>
>> explain analyze select ac.attribute_id, la.name, ac.sort_order from
>> attribute_category ac, localized_attribute la where ac.category_id =
>> 1001402 and la.locale_id = 1000001 and ac.is_browsable = 'true' and
>> la.attribute_id = ac.attribute_id and exists ( select 'x' from
>> product_attribute_value pav, category_product cp where
>> (pav.product_id
>> || '.' || pav.attribute_id) = (cp.product_id || '.' ||
>> ac.attribute_id) and pav.status_code is null and (cp.category_id ||
>> '.' || cp.is_visible) = '1001402.true') order by (ac.sort_order is
>> null), ac.sort_order, la.name asc;
>
> is there some reason for the complicated form of the
> join conditions in the subselect?
>
> would this not be clearer:
>
> explain analyze
> select ac.attribute_id,
> la.name,
> ac.sort_order
> from attribute_category ac,
> localized_attribute la
> where ac.category_id = 1001402
> and la.locale_id = 1000001
> and ac.is_browsable = 'true'
> and la.attribute_id = ac.attribute_id
> and exists
> (select 'x' from product_attribute_value pav,
> category_product cp
> where pav.product_id = cp.product_id
> and pav.attribute_id = ac.attribute_id
> and pav.status_code is null
> and cp.category_id= '1001402'
> and cp.is_visible = 'true'
> )
> order by (ac.sort_order is null),
> ac.sort_order,
> la.name asc;
>
>
> possibly the planner would have a better time
> figuring out if any indexes are usable or estimating
> the subselect rowcount
>
> gnari
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Wheeler 2006-05-21 22:45:14 Re: Benchmarking Function
Previous Message Josh Berkus 2006-05-21 19:23:49 Re: Benchmarking Function