Re: A costing analysis tool

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A costing analysis tool
Date: 2005-10-13 06:01:40
Message-ID: s34db201.057@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yes I have looked at the TODO list. There is arguably a relationship
to:

* Have EXPLAIN ANALYZE highlight poor optimizer estimates

* Log queries where the optimizer row estimates were dramatically
different from the number of rows actually found?

Neither of these, however, provides a systematic way to identify
problem areas in costing. Nor do they provide systematic regression
testing when costing is modified.

I was largely motivated to think in the direction of starting with the
tool I describe by this post:

http://archives.postgresql.org/pgsql-hackers/2005-10/msg00434.php

Also Tom Lane mentioned the need for test cases and doubt about
whether a particular fix would help or hurt overall. For example:

http://archives.postgresql.org/pgsql-hackers/2005-10/msg00417.php

The tool I propose would be "non-invasive" -- it would be a client
to the back end to help guide and check the actual back end
enhancements.

This all started because in some of our real-life queries the
optimizer is looking at a reasonable set of available plans, and
picking one which runs several times slower than one of the
alternatives. The problem is clearly that the cost numbers don't
approximate reality closely enough. I'm not convinced that the
proposed adjustment is a good idea -- it might cause other queries
which run fine now to shift to a suboptimal plan, and it might not
go far enough toward solving the problem case.

The best solution might be somewhat more sophisticated.
I suspect that consideration of effective cache size and the
expected iteration count might be necessary to get consistenly
good cost estimates without breaking anything else. Nobody
wants me to try something like that without a good way to do
regression testing. At least, that's the impression I've gotten.
And really, it's hard to pin down where the problem really lies
without a tool like this. Personally, I suspect that part of the
problem is an underestimation of the cost of the sort or the
mergejoin.

I had read through the TODO list several times, and in response to
your post searched it again for key words like:
tune, tuning, diagnostic, cost, estimate, and plan
I haven't been able to spot anything that seems to address the area
covered by the proposed tool. Is there something I'm overlooking?

My client is willing to pay for my time to address the issue which is
causing them a problem, and share that work with the PostgreSQL
community. I don't think I'd get the same response regarding
something which is not a demonstrated problem for them. I'm
certainly not looking to get adversarial with anyone, or to bypass
any part of the process. I am continually impressed by the quality
of PostgreSQL, and even more impressed by the people posting to
these lists, and the assistance they provide to the community. My
client and I both hope to give something back as it meshes with
our needs and falls within the capabilities of our staff.

If this idea survives the conceptual discussions, I'll suggest a TODO
item (if nobody beats me to it), so that it's "on the record" -- that
seems only reasonable, to prevent duplicate efforts.

Thanks for your response, and any further pointers you can provide.

-Kevin

>>> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> 10/12/05 8:27 PM >>>

Have you looked at the TODO list to see our previous ideas on tuning
diagnotics?

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2005-10-13 06:18:35 Re: pgsql: Do all accesses to shared buffer headers
Previous Message Tom Lane 2005-10-13 05:01:36 Re: A costing analysis tool