Skip site navigation (1) Skip section navigation (2)

Re: TB-sized databases

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: TB-sized databases
Date: 2007-12-07 01:28:24
Message-ID: 4758A1B8.20501@cheapcomplexdevices.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane wrote:
> Michael Stone <mstone+postgres(at)mathom(dot)us> writes:
>> OTOH, the planner can really screw up queries on really large databases. 
>> ... I've got some queries that the 
>> planner thinks will return on the order of 10^30 rows for that sort of 
>> reason. In practice, the query may return 10^3 rows....
> 
> Indeed, and if you've got examples where it's that far off, you should
> report them.

If I read this right, I've got quite a few cases where the planner
expects 1 row but gets over 2000.

And within the plan, it looks like there's a step where it expects
511 rows and gets 2390779 which seems to be off by a factor of 4600x.

Also shown below it seems that if I use "OFFSET 0" as a "hint"
I can force a much (10x) better plan.  I wonder if there's room for
a pgfoundry project for a patch set that lets us use more hints
than OFFSET 0.

    Ron

logs=# analyze;
ANALYZE
logs=# explain analyze  select * from fact natural join d_ref natural join d_uag where ref_host = 'download.com.com' and ref_path = '/[path_removed].html' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)';
                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2827.72..398919.05 rows=1 width=242) (actual time=69175.963..141550.628 rows=2474 loops=1)
   Hash Cond: (fact.ref_id = d_ref.ref_id)
   ->  Nested Loop  (cost=2819.88..398908.65 rows=511 width=119) (actual time=3094.740..139361.235 rows=2390779 loops=1)
         ->  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 rows=1 width=91) (actual time=45.937..45.948 rows=1 loops=1)
               Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'::text)
         ->  Bitmap Heap Scan on fact  (cost=2819.88..396449.49 rows=196223 width=32) (actual time=3048.770..135653.875 rows=2390779 loops=1)
               Recheck Cond: (fact.uag_id = d_uag.uag_id)
               ->  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 rows=196223 width=0) (actual time=1713.148..1713.148 rows=2390779 loops=1)
                     Index Cond: (fact.uag_id = d_uag.uag_id)
   ->  Hash  (cost=7.83..7.83 rows=1 width=127) (actual time=62.841..62.841 rows=2 loops=1)
         ->  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 rows=1 width=127) (actual time=62.813..62.823 rows=2 loops=1)
               Index Cond: (((ref_path)::text = '[path_removed].html'::text) AND ((ref_host)::text = 'download.com.com'::text))
 Total runtime: 141563.733 ms
(13 rows)



############ using "offset 0" to force a better plan.


logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'download.com.com' and ref_path = '/[path_removed].html' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)' offset 0) as a;
                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=6465.12..7575.91 rows=367 width=2096) (actual time=2659.251..14703.343 rows=2474 loops=1)
   ->  Limit  (cost=0.00..14.22 rows=1 width=218) (actual time=114.968..115.140 rows=2 loops=1)
         ->  Nested Loop  (cost=0.00..14.22 rows=1 width=218) (actual time=114.964..115.127 rows=2 loops=1)
               ->  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 rows=1 width=127) (actual time=75.891..75.900 rows=2 loops=1)
                     Index Cond: (((ref_path)::text = '[path_removed].html'::text) AND ((ref_host)::text = 'download.com.com'::text))
               ->  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 rows=1 width=91) (actual time=19.582..19.597 rows=1 loops=2)
                     Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'::text)
   ->  Bitmap Heap Scan on fact  (cost=6465.12..7556.18 rows=367 width=32) (actual time=2240.090..7288.145 rows=1237 loops=2)
         Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id))
         ->  BitmapAnd  (cost=6465.12..6465.12 rows=367 width=0) (actual time=2221.539..2221.539 rows=0 loops=2)
               ->  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 rows=196223 width=0) (actual time=1633.032..1633.032 rows=2390779 loops=2)
                     Index Cond: (fact.uag_id = a.uag_id)
               ->  Bitmap Index Scan on i__fact__ref_id  (cost=0.00..3581.50 rows=253913 width=0) (actual time=150.614..150.614 rows=77306 loops=2)
                     Index Cond: (fact.ref_id = a.ref_id)
 Total runtime: 14710.870 ms
(15 rows)


In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2007-12-07 01:55:02
Subject: Re: TB-sized databases
Previous:From: kelvanDate: 2007-12-07 00:45:26
Subject: database tuning

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group