Re: More thoughts about planner's cost estimates

From: Mike Benoit <ipso(at)snappymail(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: More thoughts about planner's cost estimates
Date: 2006-06-03 19:15:31
Message-ID: 1149362131.656.43.camel@ipso.snappymail.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pgbench appears to already support arbitrary SQL queries with the -f
switch, so why couldn't we just make it a little smarter and have people
enable SQL query logging for a day or two, then pass the log off to
pgbench:

pgbench -f <log file>

Seems to me like that wouldn't be too difficult to do, and would give
much closer "real-world" results than pgbench's built-in benchmark.

On top of that the community could start offering up "template"
benchmarks like: "busy website", "data warehouse", "forums", "financial"
and distribute them with pgbench:

pgbench -f templates/data_warehouse.pgbench
pgbench -f templates/forums.pgbench
...

From that point a brute force auto-tune utility would be pretty straight
forward to write.

pgautotune -f templates/data_warehouse.bench,myapp.sqllog

Or if one server runs multiple custom apps that you want to tune for:

pgautotune -f myapp1.sqllog,myapp2.sqllog,myapp3.sqllog

Even if it took 48hrs to run, it would be a good burn-in test for a
brand new server. ;)

On Fri, 2006-06-02 at 19:38 -0400, Tom Lane wrote:
> Rod Taylor <pg(at)rbt(dot)ca> writes:
> >> One objection to this is that after moving "off the gold standard" of
> >> 1.0 = one page fetch, there is no longer any clear meaning to the
> >> cost estimate units; you're faced with the fact that they're just an
> >> arbitrary scale. I'm not sure that's such a bad thing, though. For
> >> instance, some people might want to try to tune their settings so that
> >> the estimates are actually comparable to milliseconds of real time.
>
> > Any chance that the correspondence to time could be made a part of the
> > design on purpose and generally advise people to follow that rule?
>
> We might eventually get to that point, but I'm hesitant to try to do it
> immediately. For one thing, I really *don't* want to get bug reports
> from newbies complaining that the cost estimates are always off by a
> factor of X. (Not but what we haven't gotten some of those anyway :-()
> In the short term I see us sticking to the convention that seq_page_cost
> is 1.0 in a "typical" database, while anyone who's really hot to try to
> make the other happen is free to experiment.
>
> > If we could tell people to run *benchmark* and use those numbers
> > directly as a first approximation tuning, it could help quite a bit
> > for people new to PostgreSQL experiencing poor performance.
>
> We don't have such a benchmark ... if we did, we could have told
> people how to use it to set the variables already. I'm very very
> suspicious of any suggestion that it's easy to derive appropriate
> numbers for these settings from one magic benchmark.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
--
Mike Benoit <ipso(at)snappymail(dot)ca>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2006-06-03 19:18:14 Re: More thoughts about planner's cost estimates
Previous Message Greg Stark 2006-06-03 19:09:45 Re: COPY (query) TO file