Skip site navigation (1) Skip section navigation (2)

Re: Should Oracle outperform PostgreSQL on a complex

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Pål Stenslet <paal(dot)stenslet(at)exie(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Should Oracle outperform PostgreSQL on a complex
Date: 2005-12-18 02:02:48
Message-ID: 43A4C348.7050703@paradise.net.nz (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> 
>>On Fri, 2005-12-16 at 23:28 -0500, Bruce Momjian wrote:
>>
>>>How are star joins different from what we do now?
> 
> 
>>Methods:
>>1. join all N small tables together in a cartesian product, then join to
>>main Large table once (rather than N times)
> 
> 
> Of course, the reason the current planner does not think of this is that
> it does not consider clauseless joins unless there is no alternative.
> 
> However, I submit that it wouldn't pick such a plan anyway, and should
> not, because the idea is utterly stupid.  The plan you currently get for
> this sort of scenario is typically a nest of hash joins:
> 
>                                QUERY PLAN                               
> ------------------------------------------------------------------------
>  Hash Join  (cost=2.25..4652.25 rows=102400 width=16)
>    Hash Cond: ("outer".f1 = "inner".f1)
>    ->  Hash Join  (cost=1.12..3115.12 rows=102400 width=12)
>          Hash Cond: ("outer".f2 = "inner".f1)
>          ->  Seq Scan on fact  (cost=0.00..1578.00 rows=102400 width=8)
>          ->  Hash  (cost=1.10..1.10 rows=10 width=4)
>                ->  Seq Scan on d2  (cost=0.00..1.10 rows=10 width=4)
>    ->  Hash  (cost=1.10..1.10 rows=10 width=4)
>          ->  Seq Scan on d1  (cost=0.00..1.10 rows=10 width=4)
> (9 rows)
> 
> This involves only one scan of the fact table.  As each row is pulled up
> through the nest of hash joins, we hash one dimension key and join to
> one small table at each level.  This is at worst the same amount of work
> as hashing all the keys at once and probing a single cartesian-product
> hashtable, probably less work (fewer wasted key-comparisons).  And
> definitely less memory.  You do have to keep your eye on the ball that
> you don't waste a lot of overhead propagating the row up through
> multiple join levels, but we got rid of most of the problem there in
> 8.1 via the introduction of "virtual tuple slots".  If this isn't fast
> enough yet, it'd make more sense to invest effort in further cutting the
> executor's join overhead (which of course benefits *all* plan types)
> than in trying to make the planner choose a star join.
> 
> 
>>2. transform joins into subselects, then return subselect rows via an
>>index bitmap. Joins are performed via a bitmap addition process.
> 
> 
> This one might be interesting but it's not clear what you are talking
> about.  "Bitmap addition"?

Yeah - the quoted method of "make a cartesian product of the dimensions 
and then join to the fact all at once" is not actually used (as written) 
in many implementations - probably for the reasons you are pointing out. 
I found these two papers whilst browsing:


http://www.cs.brown.edu/courses/cs227/Papers/Indexing/O'NeilGraefe.pdf
http://www.dama.upc.edu/downloads/jaguilar-2005-4.pdf


They seem to be describing a more subtle method making use of join 
indexes and bitmapped indexes.

If I understand it correctly, the idea is to successively build up a 
list (hash / bitmap) of fact RIDS that will satisfy the query, and when 
complete actually perform the join and construct tuples. The goal being 
  similar in intent to the star join method (i.e. access the fact table 
as little and as "late" as possible), but avoiding the cost of actually 
constructing the dimension cartesian product.

cheers

Mark

In response to

Responses

pgsql-performance by date

Next:From: Harry JacksonDate: 2005-12-18 02:11:16
Subject: Re: PostgreSQL performance question.
Previous:From: Ben TrewernDate: 2005-12-18 01:10:21
Subject: Speed of different procedural language

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group