Re: Query tuning

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Kevin Kempter <kevink(at)consistentstate(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query tuning
Date: 2009-08-19 17:17:26
Message-ID: C6B181B6.F2DD%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 8/19/09 9:28 AM, "Kevin Kempter" <kevink(at)consistentstate(dot)com> wrote:

> Hi all;
>
> we've been fighting this query for a few days now. we bumped up the statistict
> target for the a.id , c.url_hits_id and the b.id columns below to 250 and ran
> an analyze on the relevant tables. we killed it after 8hrs.
>
> Note the url_hits table has > 1.4billion rows
>
> Any suggestions?
>

Have you tried setting work_mem higher for just this query?

The big estimated cost is the sequential scan on url_hits. But in reality,
if the estimates are off the sort and index scan at the end might be your
bottleneck. Larger work_mem might make it choose another plan there.

But if the true cost is the sequential scan on url_hits, then only an index
there will help.

>
>
> $ psql -ef expl.sql pwreport
> explain
> select
> a.id,
> ident_id,
> time,
> customer_name,
> extract('day' from timezone(e.name, to_timestamp(a.time))) as day,
> category_id
> from
> pwreport.url_hits a left outer join
> pwreport.url_hits_category_jt c on (a.id = c.url_hits_id),
> pwreport.ident b,
> pwreport.timezone e
> where
> a.ident_id = b.id
> and b.timezone_id = e.id
> and time >= extract ('epoch' from timestamp '2009-08-12')
> and time < extract ('epoch' from timestamp '2009-08-13' )
> and direction = 'REQUEST'
> ;
>
> QUERY
> PLAN
> ------------------------------------------------------------------------------
> ------------------------------------------------------------------------------
> --------------------------------------------------------
> Merge Right Join (cost=47528508.61..180424544.59 rows=10409251 width=53)
> Merge Cond: (c.url_hits_id = a.id)
> -> Index Scan using mt_url_hits_category_jt_url_hits_id_index on
> url_hits_category_jt c (cost=0.00..122162596.63 rows=4189283233 width=8)
> -> Sort (cost=47528508.61..47536931.63 rows=3369210 width=49)
> Sort Key: a.id
> -> Hash Join (cost=2565.00..47163219.21 rows=3369210 width=49)
> Hash Cond: (b.timezone_id = e.id)
> -> Hash Join (cost=2553.49..47116881.07 rows=3369210
> width=37)
> Hash Cond: (a.ident_id = b.id)
> -> Seq Scan on url_hits a (cost=0.00..47051154.89
> rows=3369210 width=12)
> Filter: ((direction =
> 'REQUEST'::proxy_direction_enum) AND (("time")::double precision >=
> 1250035200::double precision) AND (("time")::double precision <
> 1250121600::double precision))
> -> Hash (cost=2020.44..2020.44 rows=42644 width=29)
> -> Seq Scan on ident b (cost=0.00..2020.44
> rows=42644 width=29)
> -> Hash (cost=6.78..6.78 rows=378 width=20)
> -> Seq Scan on timezone e (cost=0.00..6.78 rows=378
> width=20)
> (15 rows)
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nikolas Everett 2009-08-19 17:31:30 Re: Query tuning
Previous Message Grzegorz Jaśkiewicz 2009-08-19 16:38:08 Re: Query tuning