Skip site navigation (1) Skip section navigation (2)

Impact of foreign keys on a simple count(*) ?

From: "C(dot) Bensend" <benny(at)bennyvision(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Impact of foreign keys on a simple count(*) ?
Date: 2004-10-09 03:37:33
Message-ID: 51319.63.227.74.41.1097293075.squirrel@63.227.74.41 (view raw or flat)
Thread:
Lists: pgsql-sql
Hey folks (long email, my apologies),

   I am wrapping up my schema upgrade, and I just noticed a real
show-stopper for me...  Here is the "before" table structure:

 email_id          | integer                     | not null
 joejob            | boolean                     | default false
 bayes_poison      | boolean                     | default false
 perm_error        | boolean                     | default false
 bait_id           | integer                     | default 0
 num_recip         | integer                     | default 1
 product           | integer                     | default 1043
 date_received     | timestamp with time zone    |
 only_date         | date                        |
 only_time         | time with time zone         |
 maildir_file      | character varying(64)       |
 errors_to         | character varying(512)      |
 reply_to          | character varying(512)      |
 spammer           | character varying(512)      |
 return_path       | character varying(512)      |
 received_from     | character varying(512)      |
 message_id        | character varying(512)      |
 target_domain     | character varying(512)      |
 mail_date         | character varying(512)      |
 x_priority        | character varying(512)      |
 x_msmail_priority | character varying(512)      |
 x_mimeole         | character varying(512)      |
 mime_version      | character varying(512)      |
 subject           | character varying(1024)     |
 mail_to           | character varying(2048)     |
 x_mailer          | character varying(2048)     |
 content_type      | character varying(2048)     |
 user_agent        | character varying(2048)     |
 cc                | character varying(2048)     |
 comments          | character varying(8192)     |
 last_mod          | timestamp without time zone | default
('now'::text)::timest
amp(6) with time zone
Indexes:
    "emails_pkey" primary key, btree (email_id)
    "emails_idx_bait_id" btree (bait_id)
    "emails_idx_mail_to" btree (mail_to)
    "emails_idx_only_date" btree (only_date)
    "emails_idx_only_time" btree (only_time)
    "emails_idx_product" btree (product)
    "emails_idx_received_from" btree (received_from)
    "emails_idx_subject" btree (subject)
    "emails_idx_target_domain" btree (target_domain)


And here is the "after" structure:

 email_id          | integer                  | not null
 joejob            | boolean                  | default false
 bayes_poison      | boolean                  | default false
 perm_error        | boolean                  | default false
 num_recip         | integer                  | default 1
 mydom_id          | integer                  | default 0
 spamv_id          | integer                  | default 1053
 spammer_id        | integer                  | default 1003
 last_mod          | timestamp with time zone | default
('now'::text)::timestamp
 with time zone
 bait_id           | integer                  |
 product_id        | integer                  |
 date_received     | timestamp with time zone |
 only_date         | date                     |
 only_time         | time with time zone      |
 maildir_file      | character varying(128)   |
 x_priority        | character varying(128)   |
 x_msmail_priority | character varying(128)   |
 x_mimeole         | character varying(128)   |
 mime_version      | character varying(512)   |
 received_from     | character varying(512)   |
 content_type      | character varying(512)   |
 errors_to         | character varying(512)   |
 user_agent        | character varying(512)   |
 mail_date         | character varying(512)   |
 x_mailer          | character varying(512)   |
 return_path       | character varying(512)   |
 message_id        | character varying(512)   |
 reply_to          | character varying(512)   |
 subject           | character varying(1024)  |
 mail_to           | character varying(1024)  |
 cc                | character varying(2048)  |
Indexes:
    "emails_pkey" primary key, btree (email_id)
    "emails_maildir_file_key" unique, btree (maildir_file)
Foreign-key constraints:
    "$1" FOREIGN KEY (bait_id) REFERENCES bait(bait_id) ON UPDATE CASCADE ON
DELETE SET NULL
    "$2" FOREIGN KEY (product_id) REFERENCES products(product_id) ON UPDATE
CASCADE ON DELETE SET NULL
    "$3" FOREIGN KEY (mydom_id) REFERENCES my_domains(mydom_id) ON UPDATE
CASCADE ON DELETE SET NULL
    "$4" FOREIGN KEY (spamv_id) REFERENCES spamvertisers(spamv_id) ON UPDATE
CASCADE ON DELETE SET NULL
    "$5" FOREIGN KEY (spammer_id) REFERENCES spammers(spammer_id) ON UPDATE
CASCADE ON DELETE SET NULL


   I am absolutely aware of the fact that the new schema can still use
some work.  :)  But, it's a step in the right direction as I muddle my
way through this.  Not to mention, I really did need those foreign keys.

   The problem is this:

Original schema:

prod01=> EXPLAIN ANALYZE SELECT count(*) FROM emails;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=6345.61..6345.61 rows=1 width=0) (actual
time=1066.727..1066.728 rows=1 loops=1)
   ->  Seq Scan on emails  (cost=0.00..6121.49 rows=89649 width=0) (actual
time=18.214..980.040 rows=89649 loops=1)
 Total runtime: 1066.931 ms
(3 rows)



New schema:

prod01=> EXPLAIN ANALYZE SELECT count(*) FROM emails;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=22.50..22.50 rows=1 width=0) (actual
time=40818.750..40818.752 rows=1 loops=1)
   ->  Seq Scan on emails  (cost=0.00..20.00 rows=1000 width=0) (actual
time=38801.127..40458.369 rows=89649 loops=1)
 Total runtime: 40819.115 ms


   Is it the addition of the multiple foreign keys that is slowing this
down so much?  I have several calculations I need to do with the count,
so this is a big thing for me.

   Suggestions appreciated, both on the immediate problem or regarding
the new schema.  Even "you're an idiot" is fine, if it accompanies a
bit of advice.  :)

Thanks!

Benny


-- 
"Even if a man chops off your hand with a sword, you still have two nice,
sharp bones to stick in his eyes."
                                                      -- .sig on Slashdot




Responses

pgsql-sql by date

Next:From: Tom LaneDate: 2004-10-09 03:49:23
Subject: Re: Impact of foreign keys on a simple count(*) ?
Previous:From: Greg StarkDate: 2004-10-09 03:17:38
Subject: Re: help on a query

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group