Re: Should Oracle outperform PostgreSQL on a complex

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Pål Stenslet <paal(dot)stenslet(at)exie(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Should Oracle outperform PostgreSQL on a complex
Date: 2005-12-18 22:13:55
Message-ID: 43A5DF23.5010504@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Simon Riggs wrote:
> On Sat, 2005-12-17 at 13:13 -0500, Tom Lane wrote:
>
>>Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>>
>>>On Fri, 2005-12-16 at 23:28 -0500, Bruce Momjian wrote:
>>>
>>>>How are star joins different from what we do now?
>>
>>>Methods:
>>>1. join all N small tables together in a cartesian product, then join to
>>>main Large table once (rather than N times)
>>
>>Of course, the reason the current planner does not think of this is that
>>it does not consider clauseless joins unless there is no alternative.
>
>
> Understood
>
>
>> The plan you currently get for
>>this sort of scenario is typically a nest of hash joins:
>>
>> QUERY PLAN
>>------------------------------------------------------------------------
>> Hash Join (cost=2.25..4652.25 rows=102400 width=16)
>> Hash Cond: ("outer".f1 = "inner".f1)
>> -> Hash Join (cost=1.12..3115.12 rows=102400 width=12)
>> Hash Cond: ("outer".f2 = "inner".f1)
>> -> Seq Scan on fact (cost=0.00..1578.00 rows=102400 width=8)
>> -> Hash (cost=1.10..1.10 rows=10 width=4)
>> -> Seq Scan on d2 (cost=0.00..1.10 rows=10 width=4)
>> -> Hash (cost=1.10..1.10 rows=10 width=4)
>> -> Seq Scan on d1 (cost=0.00..1.10 rows=10 width=4)
>>(9 rows)
>>
>>This involves only one scan of the fact table. As each row is pulled up
>>through the nest of hash joins, we hash one dimension key and join to
>>one small table at each level.
>
>
> Understood
>
>
>>This is at worst the same amount of work
>>as hashing all the keys at once and probing a single cartesian-product
>>hashtable, probably less work (fewer wasted key-comparisons). And
>>definitely less memory. You do have to keep your eye on the ball that
>>you don't waste a lot of overhead propagating the row up through
>>multiple join levels, but we got rid of most of the problem there in
>>8.1 via the introduction of "virtual tuple slots". If this isn't fast
>>enough yet, it'd make more sense to invest effort in further cutting the
>>executor's join overhead (which of course benefits *all* plan types)
>>than in trying to make the planner choose a star join.
>
>
> That join type is used when an index-organised table is available, so
> that a SeqScan of the larger table can be avoided.
>
> I'd say the plan would make sense if the columns of the cartesian
> product match a multi-column index on the larger table that would not
> ever be used unless sufficient columns are restricted in each lookup.
> That way you are able to avoid the SeqScan that occurs for the multiple
> nested Hash Join case. (Clearly, normal selectivity rules apply on the
> use of the index in this way).
>
> So I think that plan type still can be effective in some circumstances.
> Mind you: building an N-way index on a large table isn't such a good
> idea, unless you can partition the tables and still use a join. Which is
> why I've not centred on this case as being important before now.
>
> My understanding: Teradata and DB2 use this.
>

FWIW - I think DB2 uses the successive fact RID buildup (i.e method 2),
unfortunately I haven't got a working copy of DB2 in front of me to test.

Cheers

Mark

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2005-12-18 22:21:04 Re: Should Oracle outperform PostgreSQL on a complex
Previous Message Mark Kirkwood 2005-12-18 22:10:01 Re: Should Oracle outperform PostgreSQL on a complex