Re: improving my query plan

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Kevin Kempter <kevink(at)consistentstate(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: improving my query plan
Date: 2009-08-21 01:33:56
Message-ID: C6B34794.F4EF%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 8/20/09 4:09 PM, "Kevin Kempter" <kevink(at)consistentstate(dot)com> wrote:

> Hi all;
>
>
> I have a simple query against two very large tables ( > 800million rows in
> theurl_hits_category_jt table and 9.2 million in the url_hits_klk1 table )
>
>
> I have indexes on the join columns and I've run an explain.
> also I've set the default statistics to 250 for both join columns. I get a
> very high overall query cost:
>
>

What about the actual times? The latter plan has higher cost, but perhaps
it is actually faster? If so, you can change the estimated cost by changing
the db cost parameters.

However, the second plan will surely be slower if the table is not in memory
and causes random disk access.

Note that EXPLAIN ANALYZE for the hash plan will take noticeably longer than
a plain query due to the cost of analysis on hashes.

>
>
> explain
> select
> category_id,
> url_hits_id
> from
> url_hits_klk1 a ,
> pwreport.url_hits_category_jt b
> where
> a.id = b.url_hits_id
> ;
> QUERY PLAN
> ------------------------------------------------------------------------------
> --------------
> Hash Join (cost=296959.90..126526916.55 rows=441764338 width=8)
> Hash Cond: (b.url_hits_id = a.id)
> -> Seq Scan on url_hits_category_jt b (cost=0.00..62365120.22
> rows=4323432222 width=8)
> -> Hash (cost=179805.51..179805.51 rows=9372351 width=4)
> -> Seq Scan on url_hits_klk1 a (cost=0.00..179805.51 rows=9372351
> width=4)
> (5 rows)
>
>
>
>
>
>
> If I turn off sequential scans I still get an even higher query cost:
>
>
> set enable_seqscan = off;
> SET
> explain
> select
> category_id,
> url_hits_id
> from
> url_hits_klk1 a ,
> pwreport.url_hits_category_jt b
> where
> a.id = b.url_hits_id
> ;
> QUERY PLAN
> ------------------------------------------------------------------------------
> -----------------------------------------------------------------
> Merge Join (cost=127548504.83..133214707.19 rows=441791932 width=8)
> Merge Cond: (a.id = b.url_hits_id)
> -> Index Scan using klk1 on url_hits_klk1 a (cost=0.00..303773.29
> rows=9372351 width=4)
> -> Index Scan using mt_url_hits_category_jt_url_hits_id_index on
> url_hits_category_jt b (cost=0.00..125058243.39 rows=4323702284 width=8)
> (4 rows)
>
>
>
>
> Thoughts?
>
>
>
>
> Thanks in advance
>
>
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jaime Casanova 2009-08-21 01:50:04 limiting results makes the query slower
Previous Message Scott Marlowe 2009-08-21 01:32:02 Re: number of rows estimation for bit-AND operation