Skip site navigation (1) Skip section navigation (2)

[8.1.4] Help optimizing query

From: Hayes <aperez(at)hayesinc(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: [8.1.4] Help optimizing query
Date: 2006-08-26 20:11:54
Message-ID: aperez-2D52F6.16115426082006@news.hub.org (view raw or flat)
Thread:
Lists: pgsql-performance
Hi all,

I need help making the below query go faster.  There are about 5 of 
these being done
in one report.  Some use joins instead of subselects but they all go 
about the same speed.
I'm not much of a SQL person so I just threw indices on everything 
involved but it still
takes about at least 19sec and sometimes as much as 60s.  I'd be happy 
to get it to about
5s.

Other info:  random_page_cost is 4 as I/O on this box as kinda slow.  
It's a converted desktop with a single IDE drive.  shared_buffers is 
12500 and effective_page_cache is 100M.  I upped the statistics on 
entry_date and session_id to 1000.  I analyzed the tables after 
modifying the statistics.  The actual deployment platform is a lot 
beefier but I'd like these queries to at least be tolerable on this 
machine.

I can see that the estimate for the GroupAggregate is off, if I'm 
interpreting things correctly, but I don't know what to do about it.

tia,
arturo

Query:

SELECT subscription_id, 
       to_char(sum(session_length), 'HH24:MI:SS') as session_length,
       sum(hits) as hits,
       2006 as theYear,
       2 as theQuarter,
       sum(count) as count
FROM (
     SELECT subscription_id,
       count(distinct session_id) as count, 
       age(MAX(entry_date),MIN(entry_date)) as session_length, 
       COUNT(action) as hits
     FROM
   extended_user JOIN user_tracking USING (user_id)
   WHERE subscription_id > 0 AND
         EXTRACT(year from entry_date) = 2006 AND 
         EXTRACT(quarter from entry_date) = 2
         GROUP BY session_id,
          subscription_id
          ) as session_stuff
              WHERE subscription_id > 0
         GROUP BY subscription_id
         ORDER BY subscription_id;


Sort  (cost=123305.88..123306.38 rows=200 width=36) (actual 
time=75039.706..75040.500 rows=258 loops=1)
   Sort Key: session_stuff.subscription_id
   ->  HashAggregate  (cost=123294.24..123298.24 rows=200 width=36) 
(actual time=75036.487..75038.360 rows=258 loops=1)
         ->  GroupAggregate  (cost=108839.34..118475.94 rows=240915 
width=72) (actual time=68016.583..74702.710 rows=38369 loops=1)
               ->  Sort  (cost=108839.34..109441.63 rows=240915 
width=72) (actual time=67978.193..68982.962 rows=245727 loops=1)
                     Sort Key: user_tracking.session_id, 
extended_user.subscription_id
                     ->  Hash Join  (cost=7746.59..75492.37 rows=240915 
width=72) (actual time=16944.487..50737.230 rows=245727 loops=1)
                           Hash Cond: ("outer".user_id = "inner".user_id)
                           ->  Bitmap Heap Scan on user_tracking  
(cost=7524.10..68644.10 rows=240950 width=72) (actual 
time=16843.695..48306.383 rows=258923 loops=1)
                                 Recheck Cond: 
((date_part('quarter'::text, entry_date) = 2::double precision) AND 
(date_part('year'::text, entry_date) = 2006::double precision))
                                 ->  BitmapAnd  (cost=7524.10..7524.10 
rows=240950 width=0) (actual time=16779.178..16779.178 rows=0 loops=1)
                                       ->  Bitmap Index Scan on 
user_tracking_quarter_idx  (cost=0.00..3331.51 rows=533288 width=0) 
(actual time=9079.545..9079.545 rows=533492 loops=1)
                                             Index Cond: 
(date_part('quarter'::text, entry_date) = 2::double precision)
                                       ->  Bitmap Index Scan on 
user_tracking_year_idx  (cost=0.00..4192.34 rows=671239 width=0) (actual 
time=7685.906..7685.906 rows=671787 loops=1)
                                             Index Cond: 
(date_part('year'::text, entry_date) = 2006::double precision)
                           ->  Hash  (cost=206.42..206.42 rows=6428 
width=8) (actual time=100.754..100.754 rows=6411 loops=1)
                                 ->  Seq Scan on extended_user  
(cost=0.00..206.42 rows=6428 width=8) (actual time=0.020..28.873 
rows=6411 loops=1)
                                       Filter: ((subscription_id > 0) 
AND (subscription_id > 0))
 Total runtime: 75069.453 ms

Tables:

This one has about 6-7k rows.

               Table "public.extended_user"
      Column       |           Type           | Modifiers 
-------------------+--------------------------+-----------
 create_date       | timestamp with time zone | not null
 email             | character varying(99)    | 
 first_name        | character varying(99)    | not null
 last_name         | character varying(99)    | not null
 license_agreement | boolean                  | not null
 license_date      | timestamp with time zone | 
 password          | character varying(32)    | not null
 subscription_id   | integer                  | not null
 user_id           | integer                  | not null
 user_name         | character varying(99)    | not null
Indexes:
    "extended_user_pkey" PRIMARY KEY, btree (user_id)
    "extended_user_subscription_id_idx" btree (subscription_id)
    "extended_user_subscription_idx" btree (subscription_id)
Foreign-key constraints:
    "extended_user_subscription_id_fkey" FOREIGN KEY (subscription_id) 
REFERENCES subscription(subscription_id) DEFERRABLE INITIALLY DEFERRED

This one has about 2k rows.

               Table "public.subscription"
      Column      |           Type           | Modifiers 
------------------+--------------------------+-----------
 allow_printing   | boolean                  | not null
 company_id       | character varying(50)    | not null
 company_name     | character varying(100)   | not null
 end_date         | timestamp with time zone | 
 licenses         | integer                  | not null
 pass_through_key | character varying(50)    | 
 start_date       | timestamp with time zone | not null
 subscription_id  | integer                  | not null
Indexes:
    "subscription_pkey" PRIMARY KEY, btree (subscription_id)


This one has about 1.4M rows.  It's kind of a log of pages visited.

                                               Table 
"public.user_tracking"
      Column      |            Type             |                                
Modifiers                                 
------------------+-----------------------------+------------------------
--------------------------------------------------
 action           | character varying(255)      | not null
 entry_date       | timestamp without time zone | not null
 note             | text                        | 
 report_id        | integer                     | 
 session_id       | character varying(255)      | not null
 user_id          | integer                     | 
 user_tracking_id | integer                     | not null default 
nextval('user_tracking_user_tracking_id_seq'::regclass)
Indexes:
    "user_tracking_pkey" PRIMARY KEY, btree (user_tracking_id)
    "user_tracking_entry_date_idx" btree (entry_date)
    "user_tracking_month_idx" btree (date_part('month'::text, 
entry_date))
    "user_tracking_quarter_idx" btree (date_part('quarter'::text, 
entry_date))
    "user_tracking_report_id_idx" btree (report_id)
    "user_tracking_session_idx" btree (session_id)
    "user_tracking_user_id_idx" btree (user_id)
    "user_tracking_year_idx" btree (date_part('year'::text, entry_date))
Foreign-key constraints:
    "user_tracking_report_id_fkey" FOREIGN KEY (report_id) REFERENCES 
article(article_id) DEFERRABLE INITIALLY DEFERRED
    "user_tracking_user_id_fkey" FOREIGN KEY (user_id) REFERENCES 
extended_user(user_id) DEFERRABLE INITIALLY DEFERRED

Responses

pgsql-performance by date

Next:From: Alvaro HerreraDate: 2006-08-27 02:45:54
Subject: Re: stats reset during pg_restore?
Previous:From: George PavlovDate: 2006-08-26 17:13:19
Subject: stats reset during pg_restore?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group