Re: a path towards replacing GEQO with something better

From: AJG <ayden(at)gera(dot)co(dot)nz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: a path towards replacing GEQO with something better
Date: 2021-06-19 20:38:58
Message-ID: 1624135138590-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I stumbled across this which may be of interest to this topic and GEQO
alternative.

The main creator/author of Neo and Bao (ML for Query Optimizer) Ryan Marcus
(finishing Postdoc and looking for job) recently posted [1] about Bao for
distributed systems.

But what was interesting was the links he had to a 2020 Neo YouTube [2]
which discussed better cardinality estimation / 90% less errors (vs.
Postgres 10) only improved query latency by 2-3%, and other MLs made worse
in other scenarios.

Other interesting takeaways from the video (summary):

PostgreSQL Query Optimizer – 40k LOC. PG10 70% worse/slower than Oracle. PG
has 3 major flaws in QO, 1 fixed in PG11. Neo 10-25% better than PG QO after
30hr training (using GPU). Neo drops to 10% better if 3 flaws were / could
be fixed.

MS SQL – 1 million LOC.

Oracle – 45-55 FTEs working on it. No LOC given by Oracle. Appear to focus
on TPC-DS. NEO better than Oracle after 60hr training (using GPU).

Humans and hand tuning will always beat ML. I.e. Neo (and Bao) good for
those who cannot afford a fulltime DBA doing query optimizing.

Bao – follow-on work from Neo.
“This is a prototype implementation of Bao for PostgreSQL. Bao is a learned
query optimizer that learns to "steer" the PostgreSQL optimizer by issuing
coarse-grained query hints. For more information about Bao”

BAO GitHub here [3] and is AGPLv3 license (not sure if that’s good or bad).

Bao drawbacks… (but may not matter from a GEQO perspective??)

“Of course, Bao does come with some drawbacks. Bao causes query optimization
to take a little bit more time (~300ms), requiring quite a bit more
computation. We studied this overhead in our SIGMOD paper. For data
warehouse workloads, which largely consists of long-running, resource
intensive queries, Bao’s increased overhead is hardly noticeable. However,
for workloads with a lot of short running queries, like OLTP workloads, this
might not be the case. We are currently working on new approaches to
mitigate that problem – so stay tuned!”

[1] https://rmarcus.info/blog/2021/06/17/bao-distributed.html
[2] https://www.youtube.com/watch?v=lMb1yNbIopc
Cardinality errors impact on latency - Starting at 8:00, interesting at
10:10 approx.
[3] https://github.com/learnedsystems/baoforpostgresql

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2021-06-19 20:39:18 Re: Race between KeepFileRestoredFromArchive() and restartpoint
Previous Message Tom Lane 2021-06-19 19:16:50 Re: Race condition in InvalidateObsoleteReplicationSlots()