3-table query optimization

From: Michal Taborsky - Internet Mall <michal(dot)taborsky(at)mall(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: 3-table query optimization
Date: 2006-08-09 14:06:35
Message-ID: 44D9EBEB.9010704@mall.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello everyone.

My (simplified) database structure is:

a) table product (150000 rows)
product_id BIGINT PRIMARY KEY
title TEXT
...

b) table action (5000 rows)
action_id BIGINT PRIMARY KEY
product_id BIGINT, FK to product
shop_group_id INTEGER (there are about 5 groups, distributed about evenly)

c) table product_program (500000 rows)
program_id BIGINT (there are about 50 unique)
product_id BIGINT, FK to product

I need to query products, which are in action table of specific group
and in product_program for a specific program_id. The query is taking
too long to my liking My query is:

SELECT product.product_id
FROM action
JOIN product ON (product.product_id=action.product_id)
WHERE action.shop_group_id=1
AND EXISTS (SELECT 1
FROM catalog.product_program
WHERE product_id=product.product_id
AND product_program.program_id =1104322
)

QUERY PLAN
Nested Loop (cost=0.00..18073.81 rows=1220 width=8) (actual
time=10.153..2705.891 rows=636 loops=1)
-> Seq Scan on "action" (cost=0.00..135.74 rows=2439 width=8)
(actual time=8.108..36.684 rows=2406 loops=1)
Filter: (shop_group_id = 1)
-> Index Scan using product_pkey on product (cost=0.00..7.34 rows=1
width=8) (actual time=1.031..1.097 rows=0 loops=2406)
Index Cond: ((product.product_id)::bigint =
("outer".product_id)::bigint)
Filter: (subplan)
SubPlan
-> Index Scan using product_program_pkey on product_program
(cost=0.00..4.33 rows=1 width=0) (actual time=0.455..0.455 rows=0
loops=2406)
Index Cond: (((program_id)::bigint = 1104322) AND
((product_id)::bigint = ($0)::bigint))
Total runtime: 2708.575 ms

I also tried this:

SELECT product.product_id
FROM action
JOIN product ON (product.product_id=action.product_id)
JOIN catalog.product_program ON (
product_program.product_id=product.product_id AND
product_program.program_id =1104322)
WHERE action.shop_group_id=1

With about the same results (a bit better, but for different groups it
was vice versa):

QUERY PLAN
Nested Loop (cost=141.84..3494.91 rows=139 width=8) (actual
time=118.584..1295.303 rows=636 loops=1)
-> Hash Join (cost=141.84..2729.11 rows=253 width=16) (actual
time=118.483..231.103 rows=636 loops=1)
Hash Cond: (("outer".product_id)::bigint =
("inner".product_id)::bigint)
-> Index Scan using product_program_pkey on product_program
(cost=0.00..2470.04 rows=7647 width=8) (actual time=0.047..73.514
rows=7468 loops=1)
Index Cond: ((program_id)::bigint = 1104322)
-> Hash (cost=135.74..135.74 rows=2439 width=8) (actual
time=118.114..118.114 rows=0 loops=1)
-> Seq Scan on "action" (cost=0.00..135.74 rows=2439
width=8) (actual time=0.019..106.864 rows=2406 loops=1)
Filter: (shop_group_id = 1)
-> Index Scan using product_pkey on product (cost=0.00..3.01 rows=1
width=8) (actual time=1.300..1.655 rows=1 loops=636)
Index Cond: (("outer".product_id)::bigint =
(product.product_id)::bigint)

Any ideas if this is really the best I can expect, or is there something
amiss there and my query is wrong for this type of task? My gut feeling
tells me, that this kind of query should be a lot faster. The hardware
is Dual Xeon with enough of RAM and other operations run just fine.

Thank you.

--
Michal Táborský

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Patrice Beliveau 2006-08-09 14:23:01 Re: Optimizing queries
Previous Message Merlin Moncure 2006-08-09 13:20:38 Re: Hardware upgraded but performance still ain't good enough