Slow GROUP BY query

From: Stuart Brooks <stuartb(at)cat(dot)co(dot)za>
To: pgsql-sql(at)postgresql(dot)org
Subject: Slow GROUP BY query
Date: 2008-01-29 14:43:31
Message-ID: 479F3B93.9070907@cat.co.za
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have a very simple table set:

Transactions:
transaction_key PRIMARY KEY
client TEXT
time TIMESTAMP

LineItems
transaction_key INT
amount INT

A query to print the contents of transactions with a sum of the line
item amounts provides a very suboptimal result. The problem seems to be
the GROUP BY clause as it doesn't use the primary index. Rewriting the
query to only group on the transaction_key and returning the max of the
other transaction fields results in a query of <1ms. (see queries below)

Can anyone shed any light here, I would have expected the queries to
take roughly the same time?

Out of interest, since we are grouping by transaction_key which is
unique, surely the other Transaction fields in the group by could be
ignored by the planner?

Thanks
Stuart

(running postgresql 8.2.5 on NetBSD 3)

>> Slow query

EXPLAIN
SELECT t.transaction_key,t.cashier,t.time,SUM(l.amount)
FROM Transactions t JOIN LineItems l USING (transaction_key)
GROUP BY t.transaction_key,t.cashier,t.time
ORDER BY t.transaction_key;
QUERY PLAN
-------------------------------------------------------------------------------------------
Sort (cost=449.16..454.16 rows=2000 width=32)
Sort Key: t.transaction_key
-> HashAggregate (cost=314.50..339.50 rows=2000 width=32)
-> Hash Join (cost=66.00..262.07 rows=5243 width=32)
Hash Cond: (l.transaction_key = t.transaction_key)
-> Seq Scan on lineitems l (cost=0.00..117.43 rows=5243 width=16)
-> Hash (cost=41.00..41.00 rows=2000 width=24)
-> Seq Scan on transactions t (cost=0.00..41.00 rows=2000 width=24)
(8 rows)

>> Fast query

EXPLAIN
SELECT t.transaction_key,MAX(t.cashier),MAX(t.time),SUM(l.amount)
FROM Transactions t JOIN LineItems l USING (transaction_key)
GROUP BY t.transaction_key
ORDER BY t.transaction_key;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..459.11 rows=2000 width=32)
-> Merge Join (cost=0.00..371.68 rows=5243 width=32)
Merge Cond: (t.transaction_key = l.transaction_key)
-> Index Scan using transactions_pkey on transactions t (cost=0.00..86.25 rows=2000 width=24)
-> Index Scan using lineitems_transaction_index on lineitems l (cost=0.00..214.90 rows=5243 width=16)
(5 rows)

Browse pgsql-sql by date

  From Date Subject
Next Message Dawid Kuroczko 2008-01-29 15:29:50 JOINing SET returning function.
Previous Message Andreas Joseph Krogh 2008-01-29 10:24:23 Re: Slow Query problem