Skip site navigation (1) Skip section navigation (2)

Re: TB-sized databases

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: TB-sized databases
Date: 2008-03-18 17:33:26
Message-ID: 47DFFCE6.9040408@cheapcomplexdevices.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane wrote:
> Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
>> Would another possible condition for considering
>> Cartesian joins be be:
> 
>>    * Consider Cartesian joins when a unique constraint can prove
>>      that at most one row will be pulled from one of the tables
>>      that would be part of this join?
> 
> What for?  That would still lead us to consider large numbers of totally
> useless joins.
> 
> 			regards, tom lane

Often I get order-of-magnitude better queries by forcing the cartesian
join even without multi-column indexes.

Explain analyze results below.



Here's an example with your typical star schema.
   fact is the central fact table.
   d_ref is a dimension table for the referrer
   d_uag is a dimension table for the useragent.

Forcing the cartesan join using "offset 0" makes
the the query take 14 ms (estimated cost 7575).

If I don't force the cartesian join the query takes
over 100ms (estimated cost 398919).

Indexes are on each dimension; but no multi-column
indexes (since the ad-hoc queries can hit any permutation
of dimensions).

logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV=' and useragent = 'Mozilla/4.08 [en] (WinNT; U ;Nav)' offset 0 ) as a;
                                                                                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=6465.12..7575.91 rows=367 width=2096) (actual time=14.152..14.192 rows=4 loops=1)
    ->  Limit  (cost=0.00..14.22 rows=1 width=218) (actual time=0.084..0.102 rows=1 loops=1)
          ->  Nested Loop  (cost=0.00..14.22 rows=1 width=218) (actual time=0.082..0.096 rows=1 loops=1)
                ->  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 rows=1 width=127) (actual time=0.056..0.058 rows=1 loops=1)
                      Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text = 'www.real.com'::text) AND ((ref_query)::text = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='::text))
                ->  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 rows=1 width=91) (actual time=0.020..0.029 rows=1 loops=1)
                      Index Cond: ((useragent)::text = 'Mozilla/4.08 [en] (WinNT; U ;Nav)'::text)
    ->  Bitmap Heap Scan on fact  (cost=6465.12..7556.18 rows=367 width=32) (actual time=14.053..14.066 rows=4 loops=1)
          Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id))
          ->  BitmapAnd  (cost=6465.12..6465.12 rows=367 width=0) (actual time=14.016..14.016 rows=0 loops=1)
                ->  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 rows=196223 width=0) (actual time=2.258..2.258 rows=7960 loops=1)
                      Index Cond: (fact.uag_id = a.uag_id)
                ->  Bitmap Index Scan on i__fact__ref_id  (cost=0.00..3581.50 rows=253913 width=0) (actual time=9.960..9.960 rows=13751 loops=1)
                      Index Cond: (fact.ref_id = a.ref_id)
  Total runtime: 14.332 ms
(15 rows)

logs=#



logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host = 'www.real.com' and ref_path = '/products/player/more_info/moreinfo.html' and ref_query = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV=' and useragent = 'Mozilla/4.08 [en] (WinNT; U ;Nav)' ) as a;
                                                                                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=2827.72..398919.05 rows=1 width=242) (actual time=78.777..107.038 rows=4 loops=1)
    Hash Cond: (fact.ref_id = d_ref.ref_id)
    ->  Nested Loop  (cost=2819.88..398908.65 rows=511 width=119) (actual time=6.311..101.843 rows=7960 loops=1)
          ->  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 rows=1 width=91) (actual time=0.021..0.029 rows=1 loops=1)
                Index Cond: ((useragent)::text = 'Mozilla/4.08 [en] (WinNT; U ;Nav)'::text)
          ->  Bitmap Heap Scan on fact  (cost=2819.88..396449.49 rows=196223 width=32) (actual time=6.273..91.645 rows=7960 loops=1)
                Recheck Cond: (fact.uag_id = d_uag.uag_id)
                ->  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 rows=196223 width=0) (actual time=5.117..5.117 rows=7960 loops=1)
                      Index Cond: (fact.uag_id = d_uag.uag_id)
    ->  Hash  (cost=7.83..7.83 rows=1 width=127) (actual time=0.069..0.069 rows=1 loops=1)
          ->  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 rows=1 width=127) (actual time=0.059..0.062 rows=1 loops=1)
                Index Cond: (((ref_path)::text = '/products/player/more_info/moreinfo.html'::text) AND ((ref_host)::text = 'www.real.com'::text) AND ((ref_query)::text = '?ID=370&DC=&LANG=&PN=RealOne%20Player&PV=6.0.11.818&PT=&OS=&CM=&CMV=&LS=&RE=&RA=&RV='::text))
  Total runtime: 107.193 ms
(13 rows)

In response to

Responses

pgsql-performance by date

Next:From: Andrej Ricnik-BayDate: 2008-03-18 17:44:48
Subject: Re: What is the best way to storage music files in Postgresql
Previous:From: MatthewDate: 2008-03-18 16:24:27
Subject: Re: Planner mis-estimation using nested loops followup

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group