Re: reforming query for 7.0.2

From: Markus Bertheau <twanger(at)bluetwanger(dot)de>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: postgres sql list <pgsql-sql(at)postgresql(dot)org>
Subject: Re: reforming query for 7.0.2
Date: 2002-01-17 12:56:21
Message-ID: 1011272192.15907.24.camel@entwicklung01.cenes.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Josh,

> Markus,
>
> > how do I reform this query to work with 7.0.2?
>
> Better question: Why are you working with 7.0.2? Even the mass-market Linux
> distros (like Red Hat and SuSE) now come with 7.1.x.

Yeah, I know... An upgrade is definetely in the queue. But it will not
happen before 7.2 (if my imagination of its release date is correct), so
we will most probably go to that.
>
> > select * from personen join (select count(personen_id), personen_id from
> > orders group by personen_id) as ordertemp on personen.personen_id =
> > ordertemp.personen_id
>
> Acutally, this query needs some reforming on its own. While it will work, the
> following version will achieve the same result, much faster, in 7.1 (and 7.2,
> for that matter):
>
> SELECT personen.field1, personen.field2, personen.field3, count(order_id) as
> no_orders
> FROM personen LEFT OUTER JOIN orders ON personen.personen_id =
> orders.personen_id
> GROUP BY personen.field1, personen.field2, personen.field3
>
> ... you see, your subselect above is completely unnecessary. And slower than a
> LEFT OUTER JOIN. Simplicity, simplicity, simplicity!

Thank you for that.

> Now, we have the problem of no LEFT JOIN support in 7.0, so:
>
> SELECT personen.field1, personen.field2, personen.field3, count(order_id) as
> no_orders
> FROM personen JOIN orders ON personen.personen_id = orders.personen_id
> GROUP BY personen.field1, personen.field2, personen.field3
> UNION
> SELECT personen.field1, personen.field2, personen.field3, 0 as no_orders
> FROM personen
> WHERE NOT EXISTS (SELECT personen_id FROM orders WHERE personen_id =
> personen.personen_id);

Well, I actually don't need the left outer join, because I don't want
persons included that have no orders. I think the original query didn't
include them. So I turn the left outer join into a join and have the
same query on both versions :-). (We use 7.1.3 on our development
server).

Thanks again.

But let's take a look at explain. The old query shows a much smaller
cost estimate compared to the new one. Both databases are vacuumed on a
regular (daily) basis. First 7.1.3:

cenes_test=> select version();
version
-------------------------------------------------------------
PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)

// old query
cenes_test=> explain select * from personen join (select
count(orders_id), personen_id from orders group by personen_id) as
ordertemp on ordertemp.personen_id = personen.personen_id;
NOTICE: QUERY PLAN:

Nested Loop (cost=11.76..42.32 rows=15 width=272)
-> Subquery Scan ordertemp (cost=11.76..12.50 rows=15 width=8)
-> Aggregate (cost=11.76..12.50 rows=15 width=8)
-> Group (cost=11.76..12.13 rows=147 width=8)
-> Sort (cost=11.76..11.76 rows=147 width=8)
-> Seq Scan on orders (cost=0.00..6.47
rows=147 width=8)
-> Index Scan using personen_pkey on personen (cost=0.00..2.02
rows=1 width=264)

EXPLAIN

// new query

cenes_test=> explain select personen.nachname, personen.vorname,
personen.firma, personen.personen_id, count(orders_id) from personen
join orders on personen.personen_id = orders.personen_id group by
personen.nachname, personen.vorname, personen.firma,
personen.personen_id;
NOTICE: QUERY PLAN:

Aggregate (cost=162.53..164.37 rows=15 width=48)
-> Group (cost=162.53..164.00 rows=147 width=48)
-> Sort (cost=162.53..162.53 rows=147 width=48)
-> Merge Join (cost=139.59..157.24 rows=147 width=48)
-> Sort (cost=127.83..127.83 rows=1265 width=40)
-> Seq Scan on personen (cost=0.00..62.65
rows=1265 width=40)
-> Sort (cost=11.76..11.76 rows=147 width=8)
-> Seq Scan on orders (cost=0.00..6.47
rows=147 width=8)

EXPLAIN

// That's a lot more. About row estimates:

cenes_test=> select count(personen_id) from personen;
count
-------
1272
(1 row)

cenes_test=> select count(orders_id) from orders;
count
-------
189
(1 row)

cenes_test=> select count(orders_id) from orders group by personen_id
order by count desc;
count
-------
41
33
15
12
10
8
7
5
4
3
2
2
2
2
2
2
2
1
// 36 x 1 snipped
(54 rows)

On the production db:
cenes=> select version();
version
---------------------------------------------------------------
PostgreSQL 7.0.2 on i686-pc-linux-gnu, compiled by gcc 2.95.2
(1 row)

// old query doesn't work
// new query
cenes=> explain select personen.nachname, personen.vorname,
personen.firma, personen.personen_id, count(orders_id) from personen
join orders on personen.personen_id = orders.personen_id group by
personen.nachname, personen.vorname, personen.firma,
personen.personen_id;
NOTICE: QUERY PLAN:

Aggregate (cost=201.91..204.87 rows=24 width=48)
-> Group (cost=201.91..204.28 rows=237 width=48)
-> Sort (cost=201.91..201.91 rows=237 width=48)
-> Merge Join (cost=170.85..192.56 rows=237 width=48)
-> Sort (cost=18.72..18.72 rows=237 width=8)
-> Seq Scan on orders (cost=0.00..9.37
rows=237 width=8)
-> Sort (cost=152.13..152.13 rows=1500 width=40)
-> Seq Scan on personen (cost=0.00..73.00
rows=1500 width=40)

EXPLAIN

cenes=> select count(personen_id) from personen;
count
-------
1501
(1 row)

cenes=> select count(orders_id) from orders;
count
-------
238
(1 row)

count
-------
34
16
12
12
8
7
6
5
4
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
// the rest is 1
(129 rows)

How's that?

Markus Bertheau

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Jeff Eckermann 2002-01-17 15:13:10 Re: Pattern Matching on Columns
Previous Message Bhuvan A 2002-01-17 05:16:54 Pattern Matching on Columns