Re: Bunch o' dead code in GEQO

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bunch o' dead code in GEQO
Date: 2004-01-22 16:51:55
Message-ID: 12735.1074790315@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"scott.marlowe" <scott(dot)marlowe(at)ihs(dot)com> writes:
> On Thu, 22 Jan 2004, Tom Lane wrote:
>> I'm assuming that the original author of the GEQO code already did that
>> testing ...

> Hmmm. I was figuring he wasn't sure so he left them in for other people
> to test. Is this a part of the code that eats up much time, or something
> simple and fast that isn't part of the "GEQO takes 8 seconds to plan"
> problem?

Well, the basic plan of the GEQO code is

Step 1: generate a bunch of possible join paths at random.

Step 2: randomly select a pair of paths from the current population,
generate a new path that is some combination of these, and push it back
into the population, dropping the worst path from the population.
Repeat for a bunch of generations.

Step 3: take the best path in the final population.

The different recombination algorithms simply are different ways of
generating a "child" path given two "parent" paths in step 2. Changing
them wouldn't affect the runtime noticeably at all --- the primary cost
is in evaluating each generated path, which is why the runtime is
approximately the sum of the population size (step 1) and the number
of generations (step 2). Possibly a different recombiner would give a
better chance of finding a good plan, but I'm unconvinced. Arguably the
recombiners that are there are all wrong anyway, since they were all
invented to solve Traveling Salesman problems, which this is not quite.

The only way we can do much about the runtime is to reduce the default
population size. With the current default parameters, a large query
will have population size 1024 and 400 generations, so about two-thirds
of the runtime is in generating the initial random population; if we
can't make a dent in that then we're not going to gain much. The
question is what will this do to the average quality of the selected
plans. One thing I'm thinking about is trying to improve the quality of
the initial population by immediately discarding any really bad plans
(for instance, use a heuristic that pays attention to which relations
are linked by WHERE clauses).

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2004-01-22 17:12:11 Re: [HACKERS] PostgreSQL installation CD based on Morphix
Previous Message scott.marlowe 2004-01-22 16:32:21 Re: Bunch o' dead code in GEQO