Re: generalizing the planner knobs

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-04 13:57:19
Message-ID: 1133704639.2906.840.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2005-12-02 at 11:07 +0100, Csaba Nagy wrote:
> On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
> > Greg Stark <gsstark(at)mit(dot)edu> writes:
> > > On the other hand the type I would prefer to see are hints that feed directly
> > > into filling in information the planner lacks. This only requires that the
> > > user understand his own data and still lets the planner pick the best plan
> > > based on the provided information.
> >
> > This would avoid some issues, but it still is vulnerable to the problem
> > that the hint you put in your code today will fail to track changes in
> > your data tomorrow.
>
> Tom, I have to disagree here. At least in our application, we must
> provide for an acceptable worst case scenario, and sometimes a slightly
> wrong estimate can lead to a plan which is very fast 99% of the time but
> completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
> I've had this situation with some "limit" plans where the planner had
> chosen a wrong index. The problem there was that the planner had
> estimated that the query will have 20 rows as a result, but it had less,
> and resulted in the complete scan of the index... as opposed to a much
> smaller scan that would have resulted by scanning the other index, as
> that one would have provided an end condition orders of magnitudes
> sooner. Now the statistics will always be only an estimation, and +/- a
> few can really make a big difference in some situations. In this
> particular situation the index choice of the planner would have been
> faster for all cases where there were really 20 rows returned, but I
> forced it to always choose the other plan (by adding the proper order
> by) because I can't risk a bad result in any of the cases.
> In this particular case I was able to force the planner choose a
> specific plan, but that might not be always possible, so I guess it
> really would make sense to be able to tell the planner how selective
> some conditions are. And yes, sometimes I would like to "freeze" a
> specific "safe" plan for a specific query, even if it is not optimal.

Csaba raises a good point here. Many people say they want "hints" when
what they actually require the plan to be both stable and predictable.

Tom is right to point out that data can change over time. However,
experience with packaged application tuning is that you actually do want
to have things work in a stable way, even if that is somewhat
sub-optimal because when you have 1000s of statements it is important
that it doesn't change after you tune it - otherwise you never finish.
So I would like to give that requirement a name "Plan Stability"; the
actual solution to that could be many things.

Another aspect to this is predictability. At the moment, we optimise
according to the first parameter a prepared statement is bound with.
Many data distributions contain a small number of values that represent
a large fraction of the total. This can mean that it is pretty random
whether we will get a SeqScan and be stuck with it, or get an IndexScan
and be stuck with it. Either plan being wrong 50% of the time. In these
cases, hinting is definitely a very bad thing, since whichever you hint,
you'll be wrong. This situation gives us two more requirements:
- predictability - because we want to know the worst case
- flexibility - because we want to be able to take advantage of the best
case, but without causing an unconstrained worst case

So IMHO, the requirements list for prepared statement planning is that
optimization must be:
- Flexible
- Predictable
- Stable

The actual solutions to all of those things could be many and varied.

ISTM we could do some of that with another GUC, lets call it
prepare_once = on. The system default is to have a prepared statement
bound to a plan on its first parameter bind. If we set this to "off",
then the statement will replan each time we bind. This would give us
both flexibility and predictability. (As ever, someone suggest a better
name?).

The requirements for very large statement tuning are fairly different
from that. e.g. large reports or Data Warehousing queries. In those
cases, Tom's warning about the data changing can be critical and we
don't have the same need to prepare queries. Neil's original point about
needing to avoid SeqScans on some tables but not others hasn't really
been addressed. That got turned into hinting on particular nodes/node
types, but there are still issues: How do you identify one node in a
complex plan?

Perhaps the way forward would be to make enable_* accept a list of
tables, or a * (like listen_addresses). (And make "on" a synonym for "*"
and off a synonym for an empty list). That way, you'd be able to control
on a per-table basis what plan types are possible. (Still wouldn't work
that well for complex plans where same table referenced multiple times,
but its a step in the right direction.) I'm not sure I like that
particular idea much, but I'm attempting to address Neil's original
point, with which I agree.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-12-04 15:49:52 Re: generalizing the planner knobs
Previous Message Mario Weilguni 2005-12-04 08:58:15 Re: Strange left join problems in 8.1