Re: Idea for the statistics collector

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Postgresql General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Idea for the statistics collector
Date: 2002-06-21 14:02:27
Message-ID: 3D1331F3.D3120C47@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
>
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Martijn van Oosterhout wrote:
> >> Firstly, I was only thinking of going for the basic nodes (Index Scan, Seq
> >> Scan, Distinct). Other types have far more variables. Secondly, even if you
> >> only count, it's useful. For example, if it tells you that the planner is
> >> off by a factor of 10 more than 75% of the time, that's useful information
> >> independant of what the actual variables are.
>
> > Yes, only updating the stats if the estimate was off by a factor of 10
> > or so should cut down on the overhead.
>
> And reduce the usefulness even more ;-). As a pure stats-gathering
> exercise it might be worth doing, but not if you only log the failure
> cases. How will you know how well you are doing if you take a
> biased-by-design sample?

Sure is it required to count all cases, success and failure. But I don't
see why it is required to feed that information constantly back into the
statistics tables. As long as we don't restart, it's perfectly good in
the collector. And it must not be fed back to the backend on every
query.

Maybe ANALYZE would like to have some of that information? If memory
serves, ANALYZE does a poor job when the data isn't well distributet,
has few distinct values and the like. That causes wrong estimates then
(among other things, of course). The idea could be, to have ANALYZE take
a much closer look at tables with horrible estimates, to generate better
information for those.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-06-21 14:05:25 Re: Solved! MacOS X and external functions
Previous Message Tom Lane 2002-06-21 13:55:06 Re: pltcl bug in 7.2?

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2002-06-21 14:03:54 Re: Reduce heap tuple header size
Previous Message Thomas Lockhart 2002-06-21 13:47:41 Re: ADTs and embedded sql