Re: join ordering via Simulated Annealing

From: bin wang <comx0330(at)gmail(dot)com>
To: Jan Urbaski <wulczer(at)wulczer(dot)org>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: join ordering via Simulated Annealing
Date: 2009-12-23 01:57:55
Message-ID: 323338680912221757p3028180co317b923e71950f98@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I will follow it.
Thank you.

2009/12/23 Jan Urbaski <wulczer(at)wulczer(dot)org>

> Hi,
>
> I've been playing with using a Simulated Annealing-type algorithm for
> determinig join ordering for relations. To get into context see
> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00098.php
> (there's also a TODO in the wiki). There's a nice paper on that in
>
> http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf
> (also linked from that thread) and someone even wrote a patch:
> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00736.php
>
> This generally aims at being able to replace GEQO.
>
> I have some rough prototype code, but I'm not even asking people to look
> at it. It's stuffed with silly things and writes three Graphviz-style
> .dot files in /tmp for each algorithm step. But I'd like to at least
> present the idea.
>
> There are three main problems that have to be solved to get a good
> Simulated Annealing implementation:
> o) choosing the starting point for the algorithm
> o) generating the next point
> o) computing the cost in the current point
>
> The code I have now creates the initial plan by doing something similar
> to what gimme_tree does in GEQO, but without any kind of heuristic to
> try to favour joins with join restrictions (the idea is that it doesn't
> matter, since we only want to get *any* plan and we only do it once),
> but ideally it would be somehow chosen randonly from the space of all
> possible join orderings.
>
> I keep a binary tree of relations in memory, where leaves are
> RelOptInfos with 1 relid and the root is a RelOptInfo with all relids.
> Each iteration of the algorithm picks two nodes at random (with some
> restrictions, but that's not important) and tries to swap them around,
> meaning that a tree like (use a monospace font for best results):
>
> (1 2 3 4)
> *(1 2) (3 4)
> (1) (2) *(3) (4)
>
> where the parenthesised things are the two chosen nodes would get
> transformed into
>
> (1 2 3 4)
> (3) (1 2 4)
> (1 2) (4)
> (1) (2)
>
> that is, the (1 2) node and its subtree gets swapped with the (3) node
> and the upper-level nodes get changed accordingly. Sometimes a swap like
> that will produce an invalid join ordering - then swap is then reverted.
> If the join order given by the tree after the swap is legal, the paths
> are recomputed, much like in geqo_eval, and if the root node's cheapest
> path is not cheaper that before the swap, the swap gets reverted.
> Simulated Annealing algorithms permit uphill moves in terms of cost,
> with some probability that's decreasing as time passes, but that's not
> implemented yet. After a fixed amount of moves, the algorithm stops and
> returns the root node of the tree as the single RelOptInfo.
>
> The issues with the approach are:
>
> o) the initial state is not really a random plan, it's usualy a
> left-deep tree (and is very inefficient) and this might skew results.
>
> o) is swapping random nodes like that a good way of exploring the
> solution space? The SA paper suggests something much simpler, but some
> of the moves proposed there don't really make sense when using the
> make_join_rel machinery:
> *) changing the inner and outer relation of a join doesn't make
> sense, because make_join_rel is symmetrical
> *) changing the join executing strategy (nested loop, merge join,
> etc.) doesn't make sense, because make_join_rel considers all possible
> paths for a join
>
> o) each swap needs a full recalcualtion of the whole solution
> (geqo_eval does that, for instance), maybe it's possible to leave the
> subtrees of swapped nodes intact and only recalculate the nodes above
> the two swapped nodes?
>
> o) because make_join_rel scribbles on the PlannerInfo, the same hack as
> in geqo_eval is necessary: truncating join_rel_list after building all
> joinrels and nulling out the join_rel_hash
>
> o) I use make_join_rel to determine whether a join is legal, which does
> lots of other things. That looks like wasted effort, although in the end
> each time I need to build the final rel to assess the resulting path's
> cost. To follow the SA algorithm spirit more closely it would be
> necessary to only consider one, random path for each relation and make
> using different paths a move that the algorithm can choose while
> exploring the solution space. A cheaper way of determining the current
> solution's cost would be nice, too - fully rebuilding the final
> RelOptInfo after each move is annoying.
>
> Lastly, I'm lacking good testcases or even a testing approach: I'm
> generating silly queries and looking at how they get optimised, but if
> someone has a real dataset and examples of joins that cannot be planned
> with the standard planner, I would be interested to compare the results
> my prototype gets with those produced by GEQO.
>
> The code, a module that you can LOAD into a backend, is here (if you
> really want to see it):
> http://git.wulczer.org/?p=saio.git
> Set the saio GUC to true and saio_cutoff to the number of iterations you
> want.
>
> Cheers,
> Jan
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-12-23 03:01:40 Re: creating index names automatically?
Previous Message Jan Urbański 2009-12-23 01:23:55 join ordering via Simulated Annealing