Optimizer speed and GEQO (was: nested loops in joins)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Optimizer speed and GEQO (was: nested loops in joins)
Date: 1999-02-02 00:17:30
Message-ID: 21331.917914650@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have been looking at optimizer runtime using Charles Hornberger's
example case, and what I find is that the number of tables involved in
the query is a very inadequate predictor of the optimizer's runtime.
Thus, it's not surprising that we are getting poor results from using
number of tables as the GEQO threshold measure.

I started with Charles' test case as presented, then simplified it by
removing indexes from the database. This didn't change the final
plan, which was a straight nested loop over sequential scans anyway
(because the tables were so small). But it drastically reduces the
number of cases that the optimizer looks at.

Original test case: query involves seven tables with a total of twelve
indexes (six on the primary table and one on each other table).

7-index test case: I removed all but one of the indexes on the primary
table, leaving one index per table.

No-index test case: I removed all indexes.

It took a certain amount of patience to run these cases with profiling
turned on :-(, but here are the results:

7T+12I 7T+7I 7T

Runtime, sec 1800 457 59
equal() calls 585 mil 157 mil 21.7 mil
better_path() calls 72546 37418 14025
bp->path_is_cheaper 668 668 668
create_hashjoin_path 198 198 198
create_mergejoin_path 2358 723 198
create_nestloop_path 38227 20297 7765

Next, I removed the last table from Charles' query, producing these
cases:

6T+11I 6T+6I 6T

Runtime, sec 34 12 2.3
equal() calls 14.3 mil 4.7 mil 0.65 mil
better_path() calls 10721 6172 2443
bp->path_is_cheaper 225 225 225
create_hashjoin_path 85 85 85
create_mergejoin_path 500 236 85
create_nestloop_path 5684 3344 1354

The 5T+10I case is down to a couple of seconds of runtime, so I
didn't bother to do profiles for 5 tables.

A fairly decent approximation is that the runtime varies as the
square of the number of create_nestloop_path calls. That number
in turn seems to vary as the factorial of the number of tables,
with a weaker but still strong term involving the number of indexes.
I understand the square and the factorial terms, but the form of
the dependency on the index count isn't real clear.

It might be worthwhile to try a GEQO threshold based on the number of
tables plus half the number of indexes on those tables. I have no idea
where to find the number of indexes, so I'll just offer the idea for
someone else to try.

The main thing that jumps out from the profiles is that on these complex
searches, the optimizer spends practically *all* of its time down inside
better_path, scanning the list of already known access paths to see if a
proposed new path is a duplicate of one already known. (Most of the time
it is not, as shown by the small number of times path_is_cheaper gets
called from better_path.) This scanning is O(N^2) in the number of paths
considered, whereas it seems that all the other work is no worse than O(N)
in the number of paths. The bottom of the duplicate check is equal(),
which as you can see is getting called a horrendous number of times.

It would be worth our while to try to eliminate this mostly-unsuccessful
comparison operation.

I wonder whether it would work to simply always add the new path to the
list, and rely on the later "prune" step to get rid of duplicates?
The prune code is evidently far more efficient at getting rid of useless
entries than better_path is, because it's nowhere near the top of the
profile even though it's processing nearly as many paths.

Anybody here know much about how the optimizer works? I'm hesitant to
hack on it myself.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-02-02 00:46:58 Re: [HACKERS] Small patches in copy.c and trigger.c
Previous Message Bruce Momjian 1999-02-01 23:33:51 Re: [HACKERS] Patches