Re: A costing analysis tool

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: A costing analysis tool
Date: 2005-10-13 01:27:46
Message-ID: 200510130127.j9D1Rks06511@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Have you looked at the TODO list to see our previous ideas on tuning
diagnotics?

---------------------------------------------------------------------------

Kevin Grittner wrote:
> I'm looking at trying to fix some clear flaws in costing which cause
> of our real-world queries to choose sub-optimal plans under PostgreSQL.
> It's clear that there needs to be a tool to analyze the accuracy of
> costing for a variety of queries, both to direct any efforts to fix
> problems and to test for possible costing regressions. As far as I can
> tell, no such tool currently exists. If I've missed something, please
> let me know, even if it's ad hoc or incomplete.
>
> Note that I'm talking about a tool strictly to check the accuracy of
> the estimated costs of plans chosen by the planner, nothing else.
>
> I'm at the rough planning stage, and would appreciate any feedback on
> my thoughts before I start actual development. Considerations, in no
> particular order (but numbered for easy reference):
>
> (1) Most of the time, on a well-configured database, PostgreSQL
> chooses a plan which performs very well. Many test cases need to cover
> these normal cases, both to set a baseline and to catch any regression.
>
> (2) A large database must be created for these tests, since many
> issues don't show up in small tables. The same data must be generated
> in every database, so results are comparable and reproducable.
>
> (3) Developers should be able to easily add test cases, either for
> their own use or contributed to the community.
>
> (4) The same test query should be able to easily run in different
> permutations of the following:
>
> (a) With no data cached before the run, or as fully cached as possible.
> (b) With various enable_xxx settings on or off.
> (c) With or without significant dead space in the tables/indexes.
>
> (5) The tool needs to be able to run in a variety of OS environments.
> At a minimum, this means some way to pick up configuration information
> to specify how to start and stop the back end, and how to flush the
> system cache.
>
> (6) The relative costs of various plans shifts dramatically when C
> asserts are enabled. To avoid misleading results, the tool should warn
> the user when run on a build configured with --enable-cassert, and all
> results from such an environment should be conspicuously identified as
> such.
>
> (7) I envision a process to create a test database, populate it, run a
> series of test cases with EXPLAIN ANALYZE, capture the results, parse
> the results and store them in a database, analyze the results to find
> means and standard deviations both overall and for each type of plan,
> and report summaries and outliers -- with references to the test cases.
> The primary statistic of interest is actual time divided by cost. This
> seems like it would be of interest overall, and within the permutations
> mentioned above for a single query.
>
> A reasonable set of test cases would illuminate where costing
> adjustments would provide the most benefit for the least risk. By
> reining in the most extreme outliers, we could allow the planner to
> recognize the best plan among the options it is considering with
> greater accuracy. This tool could be used as a regression test to
> ensure that a costing adjustment didn't distort the cost of something
> which had been accurately costed.
>
> So, what do you think?
>
> -Kevin
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2005-10-13 02:15:48 Re: Comments on columns in the pg_catalog tables/views
Previous Message Bruce Momjian 2005-10-13 01:23:59 Re: How TODO prevent PQfnumber() from lowercasing?