ported application having performance issues

From: John Mendenhall <john(at)surfutopia(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: ported application having performance issues
Date: 2005-06-30 22:24:51
Message-ID: 20050630222451.GA17628@calvin.surfutopia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

pgsql performance gurus,

We ported an application from oracle to postgresql.
We are experiencing an approximately 50% performance
hit. I am in the process of isolating the problem.
I have searched the internet (google) and tried various
things. Only one thing seems to work. I am trying to
find out if our solution is the only option, or if I
am doing something terribly wrong.

The original application runs on the following:

hw:
cpu0: SUNW,UltraSPARC-IIi (upaid 0 impl 0x12 ver 0x12 clock 302 MHz)
mem = 393216K (0x18000000)

sw:
Solaris 5.6
Oracle 7.3.2.2.0
Apache 1.3.27
Perl 5.004_04
mod_perl 1.27
DBI 1.20
DBD::Oracle 1.12

The ported application runs on the following:

hw:
unix: [ID 389951 kern.info] mem = 262144K (0x10000000)
rootnex: [ID 466748 kern.info] root nexus = Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 360MHz)

sw:
Solaris 5.9
PostgreSQL 7.4.6
Apache 1.3.33
Perl 5.8.6
mod_perl 1.29
DBI 1.46
DBD::Pg 1.40.1

Based on assistance from another list, we have
tried the following:

(1) Upgraded DBD::Pg to current version 1.43
(2) Ensured all tables are analyzed regularly
(3) Setting some memory options in postgresql.conf
(4) Located a handful of slow queries by setting
log_min_duration_statement to 250.

Future options we will consider are:

(1) Attempting other option settings, like
random_page_cost
(2) Upgrading db server to current version 8.0.3

With our handful of slow queries, we have done
several iterations of changes to determine what
will address the issues.

We have broken this down to the direction of a join
and setting the enable_seqscan to off. The table
definitions are at the bottom of this e-mail. There
is one large table (contacts) and one smaller table
(lead_requests). The final SQL is as follows:

SELECT
c.id AS contact_id,
lr.id AS lead_request_id
FROM
lead_requests lr
JOIN contacts c ON (c.id = lr.contact_id)
WHERE
c.partner_id IS NULL
ORDER BY
contact_id

I ran this query against freshly vacuum analyzed tables.

The first run is as follows:

db=> explain analyze SELECT
db-> c.id AS contact_id,
db-> lr.id AS lead_request_id
db-> FROM
db-> lead_requests lr
db-> JOIN contacts c ON (c.id = lr.contact_id)
db-> WHERE
db-> c.partner_id IS NULL
db-> ORDER BY
db-> contact_id
db-> ;
LOG: duration: 4618.133 ms statement: explain analyze SELECT
c.id AS contact_id,
lr.id AS lead_request_id
FROM
lead_requests lr
JOIN contacts c ON (c.id = lr.contact_id)
WHERE
c.partner_id IS NULL
ORDER BY
contact_id
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=4272.84..4520.82 rows=1230 width=21) (actual time=3998.771..4603.739 rows=699 loops=1)
Merge Cond: ("outer".contact_id = "inner".id)
-> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..74.09 rows=1435 width=21) (actual time=0.070..22.431 rows=1430 loops=1)
-> Sort (cost=4272.84..4352.28 rows=31775 width=11) (actual time=3998.554..4130.067 rows=32640 loops=1)
Sort Key: c.id
-> Seq Scan on contacts c (cost=0.00..1896.77 rows=31775 width=11) (actual time=0.040..326.135 rows=32501 loops=1)
Filter: (partner_id IS NULL)
Total runtime: 4611.323 ms
(8 rows)

As you can see, run time over 4 seconds.
Then, I set enable_seqscan = off.

db=> set enable_seqscan=off;
SET

Then I ran the exact same query:

db=> explain analyze SELECT
db-> c.id AS contact_id,
db-> lr.id AS lead_request_id
db-> FROM
db-> lead_requests lr
db-> JOIN contacts c ON (c.id = lr.contact_id)
db-> WHERE
db-> c.partner_id IS NULL
db-> ORDER BY
db-> contact_id
db-> ;
LOG: duration: 915.304 ms statement: explain analyze SELECT
c.id AS contact_id,
lr.id AS lead_request_id
FROM
lead_requests lr
JOIN contacts c ON (c.id = lr.contact_id)
WHERE
c.partner_id IS NULL
ORDER BY
contact_id
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..4749.84 rows=1230 width=21) (actual time=0.213..901.315 rows=699 loops=1)
Merge Cond: ("outer".contact_id = "inner".id)
-> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..74.09 rows=1435 width=21) (actual time=0.073..21.448 rows=1430 loops=1)
-> Index Scan using contacts_pkey on contacts c (cost=0.00..4581.30 rows=31775 width=11) (actual time=0.038..524.217 rows=32640 loops=1)
Filter: (partner_id IS NULL)
Total runtime: 903.638 ms
(6 rows)

Under 1 second. Excellent.

The contacts table has 30000+ records.
The lead_requests table has just around 1500 records.
I want the query to start with the join at the lead_requests
table since the number is so much smaller.

So, bottom line is this:
In order to get the performance to an acceptable level
(I can live with under 1 second, though less time would
be better), do I have to set enable_seqscan to off every
time I run this query? Is there a better or more acceptable
way to handle this?

Thank you very much in advance for any pointers you can
provide. And, if this is the wrong forum for this question,
please let me know and I'll ask it elsewhere.

JohnM

-----
table definitions
-----

-----
db=> \d contacts
Table "db.contacts"
Column | Type | Modifiers
------------------------------+-----------------------------+-----------
id | numeric(38,0) | not null
db_id | character varying(32) |
firstname | character varying(64) |
lastname | character varying(64) |
company | character varying(128) |
email | character varying(256) |
phone | character varying(64) |
address | character varying(128) |
city | character varying(128) |
state | character varying(32) |
postalcode | character varying(16) |
country | character varying(2) | not null
contact_info_modified | character(1) |
token_id | numeric(38,0) |
status_id | numeric(38,0) |
status_last_modified | timestamp without time zone |
notes | character varying(2000) |
demo_schedule | timestamp without time zone |
partner_id | numeric(38,0) |
prev_partner_id | numeric(38,0) |
prev_prev_partner_id | numeric(38,0) |
site_last_visited | timestamp without time zone |
source_id | numeric(4,0) |
demo_requested | timestamp without time zone |
sourcebook_requested | timestamp without time zone |
zip | numeric(8,0) |
suffix | numeric(8,0) |
feedback_request_sent | timestamp without time zone |
products_sold | character varying(512) |
other_brand | character varying(512) |
printsample_requested | timestamp without time zone |
indoor_media_sample | timestamp without time zone |
outdoor_media_sample | timestamp without time zone |
printers_owned | character varying(256) |
business_type | character varying(256) |
printers_owned2 | character varying(256) |
contact_quality_id | numeric(38,0) |
est_annual_value | numeric(38,2) |
likelyhood_of_closing | numeric(38,0) |
priority | numeric(38,0) |
business_type_id | numeric(38,0) |
lead_last_modified | timestamp without time zone |
lead_value | numeric(38,2) |
channel_contact_flag | character(1) |
request_status_last_modified | timestamp without time zone |
master_key_number | numeric(38,0) |
master_key_token | character varying(32) |
current_media_cust | character(1) |
kodak_media_id | numeric(38,0) |
printer_sample_id | numeric(38,0) |
quantity_used_id | numeric(38,0) |
rip_used_id | numeric(38,0) |
language_code | character varying(3) |
region_id | numeric(38,0) | not null
lead_deleted | timestamp without time zone |
last_request_set_status_id | numeric(38,0) |
address2 | character varying(128) |
media_usage_id | numeric(38,0) |
Indexes:
"contacts_pkey" primary key, btree (id)
"contacts_partner_id_idx" btree (partner_id)
"contacts_partner_id_null_idx" btree (partner_id) WHERE (partner_id IS NULL)
"contacts_token_id_idx" btree (token_id)
Check constraints:
"sys_c0050644" CHECK (country IS NOT NULL)
"sys_c0050643" CHECK (id IS NOT NULL)
"sys_c0050645" CHECK (region_id IS NOT NULL)
Triggers:
insert_master_key BEFORE INSERT ON contacts FOR EACH ROW EXECUTE PROCEDURE pg_fct_insert_master_key()
-----

-----
db=> \d lead_requests
Table "db.lead_requests"
Column | Type | Modifiers
-----------------------+-----------------------------+-----------
id | numeric(38,0) | not null
contact_id | numeric(38,0) | not null
request_id | numeric(38,0) | not null
date_requested | timestamp without time zone | not null
must_update_by | timestamp without time zone |
date_satisfied | timestamp without time zone |
status_id | numeric(38,0) |
request_scheduled | timestamp without time zone |
session_log_id | numeric(38,0) |
notes | character varying(2000) |
status_last_modified | timestamp without time zone |
reminder_last_sent | timestamp without time zone |
data | character varying(2000) |
fulfillment_status_id | numeric(38,0) |
Indexes:
"lead_requests_pkey" primary key, btree (id)
"lead_requests_contact_id_idx" btree (contact_id)
"lead_requests_request_id_idx" btree (request_id)
Check constraints:
"sys_c0049877" CHECK (request_id IS NOT NULL)
"sys_c0049876" CHECK (contact_id IS NOT NULL)
"sys_c0049878" CHECK (date_requested IS NOT NULL)
-----

--
John Mendenhall
john(at)surfutopia(dot)net
surf utopia
internet services

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2005-06-30 22:35:38 Re: ported application having performance issues
Previous Message Tom Lane 2005-06-30 21:01:28 Re: start time very high