Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jim Nasby <jim(dot)nasby(at)openscg(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Date: 2018-08-24 00:20:31
Message-ID: CAH2-WzkFm4KBWAkDBiN2nOaXLDJ7hLt=+dY3bc10fe266E+=1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 23, 2018 at 11:10 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Rebased up to HEAD, per cfbot nagging. Still no substantive change from
> v2.

I happened to have the opportunity to talk to Tom about this patch in
person. I expressed some very general concerns that are worth
repeating publicly.

This patch adds an enhancement that is an example of a broader class
of optimizer enhancement primarily aimed at making star-schema queries
have more efficient plans, by arranging to use several independent
nested loop joins based on a common pattern. Each nestloop join has
one particular dimension table on the outer side, and the fact table
on the inner side. The query plan is not so much a tree as it is a DAG
(directed acyclic graph), because the fact table is visited multiple
times. (There are already cases in Postgres in which the query plan is
technically a DAG, actually, but it could be taken much further.)

Aside from being inherently more efficient, DAG-like star schema plans
are also *ideal* targets for parallel query. The executor can execute
each nested loop join in a parallel worker with minimal contention --
the inner side of each nestloop join all probe a different fact table
index to the others. It's almost like executing several different
simple queries concurrently, with some serial processing at the end.
Even that serial processing can sometimes be minimized by having some
of the parallel workers use a Bloom filter in shared memory.

Tom is already concerned that the optimization added by this patch may
be too much of a special case, which is understandable. It may be that
we're failing to identify some greater opportunity to add DAG-like
plans for star schema queries.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-08-24 00:30:06 Re: Removing useless DISTINCT clauses
Previous Message Stephen Frost 2018-08-24 00:05:01 Re: Removing useless DISTINCT clauses