Re: join ordering via Simulated Annealing

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: join ordering via Simulated Annealing
Date: 2010-01-08 14:41:42
Message-ID: 4B474426.8060906@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund wrote:
> On Wednesday 23 December 2009 02:23:55 Jan Urbański wrote:
>> Lastly, I'm lacking good testcases or even a testing approach:

> If you want to see some queries which are rather hard to plan with random
> search you can look at
> http://archives.postgresql.org/message-id/200907091700.43411.andres@anarazel.de
> which tom analyzed and improved here http://archives.postgresql.org/message-id/17807.1247932094@sss.pgh.pa.us

Here's the current state of affairs. I managed to make the module into
sufficiently good shape to at least not error out on Andres' example
query. I
tried it with SAIO, GEQO and the standard planner. from_collapse_limit and
join_collapse_limit were set to 14. This was on a --enable-debug
--disable-cassert -O2 build. Here are the results:

SAIO, standard values

saio_equilibrium_factor = 16
saio_temperature_reduction_factor = 0.95
time = too big

SAIO, tweaked

saio_equilibrium_factor = 12
saio_temperature_reduction_factor = 0.3
cost = 13376.10..19692.53
time = 86866.276 ms

GEQO

cost = 13936.53..19479.38
time = 182000.097 ms

STANDARD

cost = 17807.57..19339.64
time = 39361.248 ms

A couple of remarks. The standard planner found a decent plan and only ate
around 550 MB of memory while doing so. Reading this mail
http://archives.postgresql.org/pgsql-hackers/2009-07/msg01219.php
I was expecting it to start swapping, but for some reason it didn't.

GEQO produced a comparable (even if a bit better) plan to SAIO and took
quite
some more time. OTOH, with the default parameters, SAIO takes crazy
amounts of
time and I never had the patience to even wait until it finishes. Memory
usage
stayed low in both, because both of them do calculations in a separate
memory
context that gets periodically reset.

A short explanation on the saio_* parameter values. and the algorithm
itself.
The SAIO algorithm is composed of two loops:

do {
do {
random_move()
} while (!equilibrium())
reduce_temperature()
} while (!frozen())

The inner loop does random moves until it reaches "equilibrium". Moves that
improve the plan are always accepted, uphill moves are accepted with the
probability that's proportional to how much worse the new plan is and
how high
the current temperature is. The paper I quoted earlier that equilibrium is
reached after iterating the inner loop 16 * number_of_joins times. The "16"
parameter is what I called saio_equilibrium_factor. In each outer loop
iteration the temperature of the system is reduced by a factor of
saio_temperature_reduction_factor, which is 0.9 in the paper.

The catch is that because of join order constraints, lots of random
moves are
resulting simply invalid and are rejected, even if they still are
counted as a
full iteration of the inner loop. ISTM that it doesn't skew the results too
much, though.

I followed Tom's advice to mimick GEQO's way of choosing joins with join
clauses first, but that only is done when building the starting tree. After
that the moves are random. A related question: why does GEQO try to put the
largest "clumps" at the beginning of the list? Is it to achieve maximum
left-deepness? I didn't copy that in SAIO, I'm just adding the new clump
at the
beginning of the list.

The big question for me now is what are the reasonable values of those two
GUCs? I think that because of the amount of work make_join_rel does (handle
symmetric case, consider all access paths) they can be made stricter that in
the paper, so 16/0.9 seems wrong. OTOH, I have no idea why it takes so long
with those parameters cranked up, I would think they influence the
running time
linearly, but maybe I'm just way off base.

For the record, I'm attaching oprofile results for the standard planner,
GEQO
and SAIO (standard and tweaked, I ctrl+c'd the standard run). I was
profiling
the whole system, but what I'm attaching is the grepped out "postgres"
part. What I don't get is why cranking up the GUCs results in
generate_join_implied_equalities moving to the top of the list. I've spent
countless hours trying to understand why that happens and am currently at a
loss.

> Robert had another example in
> 603c8f070911271205r4d4534edt1cebcb76ff5066a5(at)mail(dot)gmail(dot)com that might be
> interesting.

I'll give it a shot soon, and then hopefully will do some plan quality
comparision on less pathological queries.

Cheers,
Jan

Attachment Content-Type Size
oprofile-callgraph.saio.tweaked text/plain 60.3 KB
oprofile-callgraph.saio text/plain 59.5 KB
oprofile.std application/vnd.sun.xml.draw.template 1.3 KB
oprofile.saio.tweaked text/plain 1.3 KB
oprofile.saio text/plain 1.3 KB
oprofile.geqo text/plain 1.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-01-08 14:46:53 Re: pg.dropped
Previous Message Stefan Kaltenbrunner 2010-01-08 14:41:10 Re: [COMMITTERS] pgsql: Also update ChangerLog file.