Re: generalizing the planner knobs

From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 10:07:06
Message-ID: 1133518026.4779.132.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > On the other hand the type I would prefer to see are hints that feed directly
> > into filling in information the planner lacks. This only requires that the
> > user understand his own data and still lets the planner pick the best plan
> > based on the provided information.
>
> This would avoid some issues, but it still is vulnerable to the problem
> that the hint you put in your code today will fail to track changes in
> your data tomorrow.

Tom, I have to disagree here. At least in our application, we must
provide for an acceptable worst case scenario, and sometimes a slightly
wrong estimate can lead to a plan which is very fast 99% of the time but
completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
I've had this situation with some "limit" plans where the planner had
chosen a wrong index. The problem there was that the planner had
estimated that the query will have 20 rows as a result, but it had less,
and resulted in the complete scan of the index... as opposed to a much
smaller scan that would have resulted by scanning the other index, as
that one would have provided an end condition orders of magnitudes
sooner. Now the statistics will always be only an estimation, and +/- a
few can really make a big difference in some situations. In this
particular situation the index choice of the planner would have been
faster for all cases where there were really 20 rows returned, but I
forced it to always choose the other plan (by adding the proper order
by) because I can't risk a bad result in any of the cases.
In this particular case I was able to force the planner choose a
specific plan, but that might not be always possible, so I guess it
really would make sense to be able to tell the planner how selective
some conditions are. And yes, sometimes I would like to "freeze" a
specific "safe" plan for a specific query, even if it is not optimal.

So for me the "hint" mechanism is good for telling the server that I'm
not interested at all in the BEST plan but which risks getting very bad
on occasions, but in a good enough plan which is safe.

And as for the selectivity changes over time, the hints will change
along. In most of the situations when selectivity change, the SQL has to
change too, sometimes even the complete workflow. I find that if changed
hints will help in some occasions then having them would mean less
maintenance than the code rewriting that would be otherwise involved...
and I'm completely sure the server can't compensate for the change of
the dynamics of the data all the time. And it definitely can't keep up
with highly dynamic data, where the statistics change constantly in big
tables...

Our application for example has kind of batch processing, where we
insert smaller or larger batches of data in a HUGE table (~200 millions
of rows), and then that data is immediately used for different
operations and then reports, and furthermore it is heavily updated. I
can't think of any reasonable statistics target and ANALYZE strategy
which could satisfy both small batches and large batches without running
ANALYZE permanently with high statistics targets on the key fields...
and even that would not be specific enough when "limit 20" is involved.
For queries involving this table I really would like to freeze plans, as
any misplanning has bad consequences.

Cheers,
Csaba.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Manfred Koizar 2005-12-02 10:09:41 Re: Shared locking in slru.c
Previous Message Martijn van Oosterhout 2005-12-02 09:48:16 Re: slow IN() clause for many cases