query optimization scenarios 17,701 times faster!!!

From: "Robert Dyas" <rdyas(at)adelphia(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: query optimization scenarios 17,701 times faster!!!
Date: 2003-04-23 18:04:02
Message-ID: MGEFJOBFIEAIADIKAMEKAEIKCIAA.rdyas@adelphia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi Everyone - this is my first post (but I have been lurking on and off for
a couple of years). Congratulations on a steadily improving product!

This is not a question, just some observations on performance that I thought
might trigger someone's thinking on ways to improve query optimization.

The following is a list of query pairs (one fast, one slow) that must
produce identical results by definition (and do), but have very different
execution times. Especially the last example. I did some development work on
a commercial SQL database product 7 years ago (names will not be used), so
although I am clueless about PostgreSQL internals, I (think) I have a grip
on some of the query optimization issues (though not necessarily a grip on
life). The data set used for all of these queries was very small - most
tables had a few hundred records or less. No, its not very scientific, but I
believe its illustrative none-the-less.

I'll just make a couple of observations on the last query and leave everyone
else to reach their own conclusions.

1) the two versions of the last query must produce identical results by
definition (and they do)
2) it appears that the optimizer is doing all of the join work before ever
considering the impact of where clause restrictions. (this may not be the
case, but it appears so)
3) It could have said, hey, I have a where clause restriction on the primary
key that is equal to a fixed value. So I have a single row from that table
to deal with, all of the columns come from that table too, and further its
left joined to the rest of the crap so I can safely ignore it.

I always think its illustrative to look at extreme examples like this to
point out optimizations that may be overlooked. If I ever get some free
time, I look forward to contributing to this wonderful project!

NOTES:
1) I didn't include the schema to keep this post reasonable. send email to
rdyas(at)adelphia(dot)net if you want the schema to look into this further.
2) the primary keys for the following tables are
org_milestones.id
tasks.task_id
contacts.contact_id
organizations.org_id

EXPLAIN ANALYZE SELECT DISTINCT org_milestones.completed_on,
org_milestones.id, org_milestones.milestone_id, org_milestones.notes,
org_milestones.org_id FROM org_milestones RIGHT OUTER JOIN organizations ON
(org_milestones.org_id = organizations.org_id) LEFT OUTER JOIN contacts ON
(contacts.org_id = organizations.org_id) LEFT OUTER JOIN tasks ON
(tasks.org_id = organizations.org_id) WHERE (organizations.org_id = 71)
ORDER BY org_milestones.completed_on, org_milestones.id
79.20 msec

EXPLAIN ANALYZE SELECT org_milestones.completed_on, org_milestones.id,
org_milestones.milestone_id, org_milestones.notes, org_milestones.org_id
FROM org_milestones RIGHT OUTER JOIN organizations ON (org_milestones.org_id
= organizations.org_id) WHERE (organizations.org_id = 71) ORDER BY
org_milestones.completed_on, org_milestones.id
6.44 msec
= 12 times faster

---------------
EXPLAIN ANALYZE SELECT DISTINCT tasks.completed, tasks.contact_id,
tasks.created_by, tasks.notes, tasks.objective, tasks.org_id, tasks.outcome,
tasks.priority, tasks.start_date, tasks.start_time, tasks.task_id,
tasks.task_type FROM tasks RIGHT OUTER organizations ON (tasks.org_id =
organizations.org_id) LEFT OUTER JOIN org_milestones ON
(org_milestones.org_id = organizations.org_id) LEFT OUTER JOIN contacts ON
(contacts.org_id = organizations.org_id) WHERE (organizations.org_id = 71)
ORDER BY tasks.start_date, tasks.start_time
2,548.71 msec

EXPLAIN ANALYZE SELECT tasks.completed, tasks.contact_id, tasks.created_by,
tasks.notes, tasks.objective, tasks.org_id, tasks.outcome, tasks.priority,
tasks.start_date, tasks.start_time, tasks.task_id, tasks.task_type FROM
tasks RIGHT OUTER JOIN organizations ON (tasks.org_id =
organizations.org_id) WHERE (organizations.org_id = 71) ORDER BY
tasks.start_date, tasks.start_time
29.21 msec
= 87 times faster

-----------
EXPLAIN ANALYZE SELECT DISTINCT contacts.address_1, contacts.address_2,
contacts.assistant_email, contacts.assistant_name, contacts.assistant_phone,
contacts.city, contacts.contact_id, contacts.email_address,
contacts.first_name, contacts.functional_role, contacts.last_name,
contacts.needs, contacts.notes, contacts.org_id, contacts.pain,
contacts.phone, contacts.reasons, contacts.reports_to, contacts.state,
contacts.title, contacts.zip_code, (contacts.first_name || ' ' ||
contacts.last_name) AS full_name FROM contacts RIGHT OUTER JOIN
organizations ON (contacts.org_id = organizations.org_id) LEFT OUTER JOIN
org_milestones ON (org_milestones.org_id = organizations.org_id) LEFT OUTER
JOIN tasks ON (tasks.org_id = organizations.org_id) WHERE
(organizations.org_id = 71)
2056.83 msec

EXPLAIN ANALYZE SELECT DISTINCT contacts.address_1, contacts.address_2,
contacts.assistant_email, contacts.assistant_name, contacts.assistant_phone,
contacts.city, contacts.contact_id, contacts.email_address,
contacts.first_name, contacts.functional_role, contacts.last_name,
contacts.needs, contacts.notes, contacts.org_id, contacts.pain,
contacts.phone, contacts.reasons, contacts.reports_to, contacts.state,
contacts.title, contacts.zip_code, (contacts.first_name || ' ' ||
contacts.last_name) AS full_name FROM contacts RIGHT OUTER JOIN
organizations ON (contacts.org_id = organizations.org_id) WHERE
(organizations.org_id = 71)
27.41 msec
= 75 times faster
-----------------

EXPLAIN ANALYZE SELECT DISTINCT organizations.city, organizations.inactive,
organizations.java_developers, organizations.name, organizations.org_id,
organizations.overview, organizations.phone, organizations.salesperson,
organizations.state, organizations.time_zone FROM organizations LEFT OUTER
JOIN org_milestones ON (org_milestones.org_id = organizations.org_id) LEFT
OUTER JOIN contacts ON (contacts.org_id = organizations.org_id) LEFT OUTER
JOIN tasks ON (tasks.org_id = organizations.org_id) WHERE
(organizations.org_id = 71) ORDER BY organizations.name
12,567.87 msec

EXPLAIN ANALYZE SELECT organizations.city, organizations.inactive,
organizations.java_developers, organizations.name, organizations.org_id,
organizations.overview, organizations.phone, organizations.salesperson,
organizations.state, organizations.time_zone FROM organizations WHERE
(organizations.org_id = 71)
0.71 msec
= 17,701 times faster
-----------------------

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-04-23 18:15:59 Re: ECPG thread-safety
Previous Message Peter Eisentraut 2003-04-23 17:56:52 Re: CLOSE command tag