Re: TB-sized databases

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: TB-sized databases
Date: 2007-12-07 01:28:24
Message-ID: 4758A1B8.20501@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Michael Stone <mstone+postgres(at)mathom(dot)us> writes:
>> OTOH, the planner can really screw up queries on really large databases.
>> ... I've got some queries that the
>> planner thinks will return on the order of 10^30 rows for that sort of
>> reason. In practice, the query may return 10^3 rows....
>
> Indeed, and if you've got examples where it's that far off, you should
> report them.

If I read this right, I've got quite a few cases where the planner
expects 1 row but gets over 2000.

And within the plan, it looks like there's a step where it expects
511 rows and gets 2390779 which seems to be off by a factor of 4600x.

Also shown below it seems that if I use "OFFSET 0" as a "hint"
I can force a much (10x) better plan. I wonder if there's room for
a pgfoundry project for a patch set that lets us use more hints
than OFFSET 0.

Ron

logs=# analyze;
ANALYZE
logs=# explain analyze select * from fact natural join d_ref natural join d_uag where ref_host = 'download.com.com' and ref_path = '/[path_removed].html' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2827.72..398919.05 rows=1 width=242) (actual time=69175.963..141550.628 rows=2474 loops=1)
Hash Cond: (fact.ref_id = d_ref.ref_id)
-> Nested Loop (cost=2819.88..398908.65 rows=511 width=119) (actual time=3094.740..139361.235 rows=2390779 loops=1)
-> Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=45.937..45.948 rows=1 loops=1)
Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'::text)
-> Bitmap Heap Scan on fact (cost=2819.88..396449.49 rows=196223 width=32) (actual time=3048.770..135653.875 rows=2390779 loops=1)
Recheck Cond: (fact.uag_id = d_uag.uag_id)
-> Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=1713.148..1713.148 rows=2390779 loops=1)
Index Cond: (fact.uag_id = d_uag.uag_id)
-> Hash (cost=7.83..7.83 rows=1 width=127) (actual time=62.841..62.841 rows=2 loops=1)
-> Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=62.813..62.823 rows=2 loops=1)
Index Cond: (((ref_path)::text = '[path_removed].html'::text) AND ((ref_host)::text = 'download.com.com'::text))
Total runtime: 141563.733 ms
(13 rows)

############ using "offset 0" to force a better plan.

logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'download.com.com' and ref_path = '/[path_removed].html' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)' offset 0) as a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=6465.12..7575.91 rows=367 width=2096) (actual time=2659.251..14703.343 rows=2474 loops=1)
-> Limit (cost=0.00..14.22 rows=1 width=218) (actual time=114.968..115.140 rows=2 loops=1)
-> Nested Loop (cost=0.00..14.22 rows=1 width=218) (actual time=114.964..115.127 rows=2 loops=1)
-> Index Scan using i_ref__val on d_ref (cost=0.00..7.83 rows=1 width=127) (actual time=75.891..75.900 rows=2 loops=1)
Index Cond: (((ref_path)::text = '[path_removed].html'::text) AND ((ref_host)::text = 'download.com.com'::text))
-> Index Scan using i_uag__val on d_uag (cost=0.00..6.38 rows=1 width=91) (actual time=19.582..19.597 rows=1 loops=2)
Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'::text)
-> Bitmap Heap Scan on fact (cost=6465.12..7556.18 rows=367 width=32) (actual time=2240.090..7288.145 rows=1237 loops=2)
Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id))
-> BitmapAnd (cost=6465.12..6465.12 rows=367 width=0) (actual time=2221.539..2221.539 rows=0 loops=2)
-> Bitmap Index Scan on i__fact__uag_id (cost=0.00..2770.83 rows=196223 width=0) (actual time=1633.032..1633.032 rows=2390779 loops=2)
Index Cond: (fact.uag_id = a.uag_id)
-> Bitmap Index Scan on i__fact__ref_id (cost=0.00..3581.50 rows=253913 width=0) (actual time=150.614..150.614 rows=77306 loops=2)
Index Cond: (fact.ref_id = a.ref_id)
Total runtime: 14710.870 ms
(15 rows)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-12-07 01:55:02 Re: TB-sized databases
Previous Message kelvan 2007-12-07 00:45:26 database tuning