## TwoPO: experimental join order algorithm

From: Adriano Lange Pg Hackers TwoPO: experimental join order algorithm 2010-07-24 13:20:01 4C4AE881.2030904@gmail.com (view raw, whole thread or download thread mbox) 2010-07-24 13:20:01 from Adriano Lange  2010-07-25 21:44:00 from Robert Haas   2010-07-25 22:45:39 from Adriano Lange    2010-07-25 23:17:37 from Robert Haas     2010-07-26 02:17:30 from Adriano Lange  2010-07-30 11:02:32 from Jan Urbański   2010-08-05 14:08:25 from Adriano Lange pgsql-hackers
```Hi,

I'd like to release the last version of my experimental join order
algorithm (TwoPO - Two Phase Optimization [1]):

http://git.c3sl.ufpr.br/gitweb?p=lbd/ljqo.git;a=summary

This algorithm is not production-ready, but an experimental set of
ideas, which need to be refined and evaluated. As the join order
optimization is a hard problem, the evaluation of a search strategy is
also a hard task. Therefore, I think the most important TODO item
related to replacement of GEQO algorithm is to define a set of
evaluation criteria considered as relevant.

TwoPO is encapsulated in a plug-in called LJQO (Large Join Query
Optimization [2]). This plug-in has two main GUC variables:

ljqo_threshold = N (like geqo_threshold)
ljqo_algorithm = {twopo|geqo}

As its name means, TwoPO has internally two search strategies that
constitute its two phases of optimization:

* Iterative Improvement (II) – Biased Sampling + Local Search
* Simulated Annealing (SA)

This algorithm also works in two search spaces:

* deep-trees (subset of bushy-trees)
- list of baserels
- initial states: biased sampling using
Prim algorithm over join graph (new, very efficient)
- moves: swap, 3cycle [2]
* bushy-trees
- binary join tree representation
- initial states: biased sampling using
Kruskal's algorithm over join graph [3,4].
- moves: associative

You can modify the functionality of TwoPO through the following parameters:

twopo_bushy_space = {true|false}
- set it to false if you want only deep-trees
default=true
twopo_heuristic_states = {true|false}
- enables heuristic to initial states
default=true
twopo_ii_stop = Int
- number of initial states
default=10
twopo_ii_improve_states = {true|false}
- find local-minimum of each initial state
default=true
twopo_sa_phase = {true|false}
- enables Simulated Annealing (SA) phase
default=true
twopo_sa_initial_temperature = Float
- initial temperature for SA phase
default=0.1
twopo_sa_temperature_reduction = Float
- temperature reduction
default=0.95
twopo_sa_equilibrium = Int
- number of states generated for each temperature
(Int * State Size)
default=16

References:

[1] Yannis E. Ioannidis e Younkyung Kang. Randomized algorithms for
optimizing large join queries. SIGMOD Rec., 19(2):312-321, 1990.

[2] Arun Swami e Anoop Gupta. Optimization of large join queries. SIGMOD
'88: Proceedings of the 1988 ACM SIGMOD international conference on
Management of data, pages 8-17, New York, NY, USA, 1988. ACM.

[3] P.B. Guttoski, M. S. Sunye, e F. Silva. Kruskal's algorithm for
query tree optimization. IDEAS '07: Proceedings of the 11th
International Database Engineering and Applications Symposium, pages
296-302, Washington, DC, USA, 2007. IEEE Computer Society.

[4] Florian Waas e Arjan Pellenkoft. Join order selection - good enough
is easy. BNCOD, pages 51-67, 2000.

Att,