Re: a path towards replacing GEQO with something better

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: a path towards replacing GEQO with something better
Date: 2021-06-15 18:15:56
Message-ID: CAFBsxsFYUiRGy5wVfhieKXPKW=ooFpwkpa9rS6qB=Mr4bZZ-8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 15, 2021 at 12:15 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jun 9, 2021 at 9:24 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:
> > 3) It actually improves the existing exhaustive search, because the
complexity of the join order problem depends on the query shape: a "chain"
shape (linear) vs. a "star" shape (as in star schema), for the most common
examples. The size of the DP table grows like this (for n >= 4):
...
> I don't quite understand the difference between the "chain" case and
> the "star" case. Can you show sample queries for each one? e.g. SELECT
> ... FROM a_1, a_2, ..., a_n WHERE <something>?

There's a very simple example in the optimizer README:

--
SELECT *
FROM tab1, tab2, tab3, tab4
WHERE tab1.col = tab2.col AND
tab2.col = tab3.col AND
tab3.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{2 3},{3 4}
{1 2 3},{2 3 4}
{1 2 3 4}
(other possibilities will be excluded for lack of join clauses)

SELECT *
FROM tab1, tab2, tab3, tab4
WHERE tab1.col = tab2.col AND
tab1.col = tab3.col AND
tab1.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{1 3},{1 4}
{1 2 3},{1 3 4},{1 2 4}
{1 2 3 4}
--

The first one is chain, and the second is star. Four is the smallest set
where we have a difference. I should now point out an imprecision in my
language: By "size of the DP table", the numbers in my first email refer to
the number of joinrels times the number of possible joins (not paths, and
ignoring commutativity). Here are the steps laid out, with cumulative
counts:

join_level, # joins, cumulative # joins:

linear, n=4
2 3 3
3 4 7
4 3 10

star, n=4
2 3 3
3 6 9
4 3 12

And of course, the chain query also has three at the last level, because it
tries two left- (or right-) deep joins and one bushy join.

> One idea I just ran across in
>
https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf
> is to try to economize by skipping consideration of bushy plans.

That's a good paper, and it did influence my thinking.

You likely already know this, but for the archives: If only chain queries
could have bushy plans, it wouldn't matter because they are so cheap to
enumerate. But, since star queries can introduce a large number of extra
joins via equivalence (same join column or FK), making them resemble
"clique" queries, bushy joins get excessively numerous.

> We
> could start doing that when some budget is exceeded, similar to what
> you are proposing here, but probably the budget for skipping
> consideration of bushy plans would be smaller than the budget for
> switching to IDP. The idea of skipping bushy plan generation in some
> cases makes sense to me intuitively because most of the plans
> PostgreSQL generates are mostly left-deep, and even when we do
> generate bushy plans, they're not always a whole lot better than the
> nearest equivalent left-deep plan. The paper argues that considering
> bushy plans makes measurable gains, but also that failure to consider
> such plans isn't catastrophically bad.

I think that makes sense. There are a few things we could do within the
"grey zone" -- too many rels to finish exhaustive search, but not enough to
justify starting directly with the greedy step -- to increase our chances
of completing, and that's a very simple one.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-06-15 18:20:51 Re: snapshot too old issues, first around wraparound and then more.
Previous Message Peter Geoghegan 2021-06-15 18:04:10 Re: disfavoring unparameterized nested loops