Suggestion: Helping the optimizer

From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Suggestion: Helping the optimizer
Date: 2002-10-11 07:21:29
Message-ID: 3DA67BF9.6080504@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I guess we had this discussion before but I have just gone through the
general list and I have encountered a problem I had a least VERY often
before.
Sometimes the planner does not find the best way through a query.
Looking at the problem of query optimization it is pretty obvious that
things like that can happen. The planner is a wonderful piece of
software and I have a high esteem of people working on it.

In some cases the planner fails because it is impossible to optimize
every query coming along - this is a natural thing.
In case of very complex SQL statements it would be wonderful to have a
command which allows the user to turn an INDEX on or off temporarily.
This would solve 90% of all problems people have with the planner.
People say that 10% of all queries cause 90% of the load. If we could
help those 10% we could gain A LOT of performance with very little effort.
Improving other things help of lot as well but in some cases the planner
decides whether a query can be done or not. YES/NO is a much bigger
problem than 5% faster or not.

Just have a look at a query like that:

$database->dbi_select("SELECT a.code, b.code, t_gruppe.id, t_strukturtyp.id,
t_struktur.id,t_struktur.oid
FROM t_master, t_struktur, t_strukturtyp, t_gruppenelement,
t_gruppe, t_text AS a, t_text AS b, t_betriebdetail,
t_strukturbetrieb
WHERE t_master.master_id = '$sportort'
AND t_master.slave_id = t_struktur.id
AND t_struktur.typid = t_strukturtyp.id
AND t_strukturtyp.kommentar = 'betrieb'
AND get_bezahlt(t_struktur.id) = 't'
AND t_strukturtyp.id = t_gruppenelement.suchid
AND t_gruppenelement.icode = 'strukturtyp'
AND t_gruppenelement.gruppeid = t_gruppe.id
AND a.suchid = t_gruppe.id
AND a.icode = 'gruppe'
AND a.sprache = $session{lang}
AND a.texttyp IS NULL
AND b.suchid = t_struktur.id
AND b.icode = 'struktur'
AND b.sprache = $session{lang}
AND b.texttyp IS NULL
AND t_gruppe.sortierung >= getmin('basic')
AND t_gruppe.sortierung <= getmax('basic')
AND t_struktur.id IN (
SELECT DISTINCT a.refid
FROM t_punkte AS a,t_text AS
b,t_struktur AS c
WHERE a.refid=b.suchid
AND a.icode='struktur'
AND b.icode='struktur'
AND a.refid=c.id
AND b.sprache=1
AND a.bildid='$picdata[0]'
AND b.texttyp IS NULL )
AND t_betriebdetail.von < now()
AND t_betriebdetail.strukturbetriebid =
t_strukturbetrieb.betriebid
AND t_strukturbetrieb.strukturid = t_struktur.id
ORDER BY t_gruppe.sortierung, t_strukturtyp.sortierung,
t_betriebdetail.leistung , b.code");

This has been taken from a real world application I have written a few
weeks ago (unfortunately it is German).
In this case the planner does it absolutely right. There are subqueries
and functions and many other ugly things for the planner but it works.
What should I do if it doesn't work?
Well, I could turn seq scans off globally even if I knew that there is
just one table causing high execution times. People can easily imagine
that a bad execution plan can lead to really bad performance -
especially when there are millions of records around. By tweaking the
optimizer a little we could gain 100% percents of performance. (idx scan
vs. nested loop and seq scan or something like that).

I guess the patch for this tweaking stuff could be fairly easy.
Currently I am abusing system tables to get the problem fixed (which is
bad for other queries of course). Running VACUUM is not that funny if
the data in the system tables is mistreated.

Concern:
People might think this is ANSI: I know that this can be a problem but
is it better if people start abusing system tables or think that
PostgreSQL is bad or slow?

Take the time and fix the planner: I can fully understand this concern.
However, there is no way to fix the optimizer to do it right in every
case. The planner is really good but I am talking about 3% of all those
queries out there - unfortunately they cause 90% of the problems people
have.

I have taken this query so that people can see that the planner is doing
good work but people should also think of a situation where a query like
that can cause severe head ache ...

maybe this problem should be discussed from time to time.

Best regards,

Hans

<http://kernel.cybertec.at>

Browse pgsql-hackers by date

  From Date Subject
Next Message Anuradha Ratnaweera 2002-10-11 10:16:57 Peer to peer replication of Postgresql databases
Previous Message Mario Weilguni 2002-10-11 07:14:50 number of attributes in page files?