7.4beta2 vs 7.3.3

From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: 7.4beta2 vs 7.3.3
Date: 2003-09-18 19:28:53
Message-ID: 3F6A0775.5040600@bigfoot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,
my impression on 7.4 performances:

===== Good news =====

How expected the use of clausole 'IN' with Postgres 7.4beta2
now is really fast ( see the result below )

===== Bad news =====

I posted time ago about a slow query:

SELECT ul.*
FROM user_logs ul,
user_data ud,
class_default cd
WHERE ul.id_user = ud.id_user AND
ud.id_class = cd.id_class AND
cd.id_provider = 39;

these are the information about the tables involved:

user_logs: ~1.5 Milion rows
user_data: ~10000 rows
class_default ~100 rows

and I found also that was better do this query
in three steps:

SELECT id_class
FROM class_default
WHERE id_provider = 39;

SELECT id_user
FROM user_data
WHERE id_class IN ( 48 ); <= result of query above

SELECT *
FROM user_logs
WHERE id_user IN ( 11126, ...., 11769 ); <= 43 values result

This last query runs for 10.30 msec !!!

Here the comparison between Postgres7.3.3 and
Postgres7.4beta2 for that original query:

Postgres 7.3.3

test=# explain analyze select ul.*
test-# from user_logs ul,
test-# user_data ud,
test-# class_default cd
test-# where ul.id_user = ud.id_user and
test-# ud.id_class = cd.id_class and
test-# cd.id_provider = 39;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=325.79..36234.20 rows=41957 width=60) (actual
time=6151.29..7022.29 rows=702 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs ul (cost=0.00..28251.30 rows=1426530
width=48) (actual time=0.02..5427.07 rows=1426530 loops=1)
-> Hash (cost=324.97..324.97 rows=329 width=12) (actual
time=320.97..320.97 rows=0 loops=1)
-> Nested Loop (cost=0.00..324.97 rows=329 width=12) (actual
time=0.24..320.89 rows=43 loops=1)
-> Seq Scan on class_default cd (cost=0.00..1.43
rows=1 width=4) (actual time=0.05..0.07 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data ud
(cost=0.00..318.55 rows=400 width=8) (actual time=0.19..320.72 rows=43
loops=1)
Index Cond: (ud.id_class = "outer".id_class)
Total runtime: 7023.15 msec
(10 rows)

Postgres 7.4beta2

test=# explain analyze select ul.*
test-# from user_logs ul,
test-# user_data ud,
test-# class_default cd
test-# where ul.id_user = ud.id_user and
test-# ud.id_class = cd.id_class and
test-# cd.id_provider = 39;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=322.94..36127.70 rows=42081 width=47) (actual
time=2629.84..2990.00 rows=702 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs ul (cost=0.00..28251.30 rows=1426530
width=47) (actual time=0.03..1738.65 rows=1426530 loops=1)
-> Hash (cost=322.12..322.12 rows=330 width=4) (actual
time=0.78..0.78 rows=0 loops=1)
-> Nested Loop (cost=0.00..322.12 rows=330 width=4) (actual
time=0.19..0.71 rows=43 loops=1)
-> Seq Scan on class_default cd (cost=0.00..1.43
rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data ud
(cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43
loops=1)
Index Cond: (ud.id_class = "outer".id_class)
Total runtime: 2990.70 msec
(10 rows)

The performance are really improved but not yet 10 msecs.

I tried nesting the 3 queries that I was speak about,
I did it only with 7.4 due the fact with 7.3.3 I never
seen the result.

test=# EXPLAIN ANALYZE SELECT *
test-# FROM user_logs
test-# WHERE id_user in (
test(# SELECT id_user
test(# FROM user_data
test(# WHERE id_class in (
test(# SELECT id_class FROM class_default WHERE id_provider = 39
test(# )
test(# );

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
Hash IN Join (cost=322.94..36127.70 rows=42081 width=47) (actual
time=2626.20..2985.53 rows=702 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs (cost=0.00..28251.30 rows=1426530
width=47) (actual time=0.03..1731.59 rows=1426530 loops=1)
-> Hash (cost=322.12..322.12 rows=330 width=4) (actual
time=0.80..0.80 rows=0 loops=1)
-> Nested Loop (cost=1.43..322.12 rows=330 width=4) (actual
time=0.22..0.72 rows=43 loops=1)
-> HashAggregate (cost=1.43..1.43 rows=1 width=4)
(actual time=0.10..0.10 rows=1 loops=1)
-> Seq Scan on class_default (cost=0.00..1.43
rows=1 width=4) (actual time=0.07..0.08 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data
(cost=0.00..315.87 rows=386 width=8) (actual time=0.11..0.54 rows=43
loops=1)
Index Cond: (user_data.id_class = "outer".id_class)
Total runtime: 2986.33 msec
(11 rows)

How you can see with 7.4 the two queries ( the original with the join
and with the IN nested ) are performing at the same way.

Regards
Gaetano Mendola

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-09-18 19:54:10 Re: Killing the backend to cancel a long waiting query
Previous Message Paulo Scardine 2003-09-18 19:19:48 Re: Killing the backend to cancel a long waiting query