Re: AW: [HACKERS] Some notes on optimizer cost estimates

From: Don Baccus <dhogaza(at)pacifier(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>, Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'pgsql-hackers(at)postgreSQL(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: AW: [HACKERS] Some notes on optimizer cost estimates
Date: 2000-01-25 23:38:04
Message-ID: 3.0.1.32.20000125153804.00f98e30@mail.pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 12:16 AM 1/26/00 +0100, Peter Eisentraut wrote:
>On 2000-01-24, Zeugswetter Andreas SB mentioned:

>> My points are:
>> 1. even if it is good for an optimizer to be smart,
>> it is even more important, that it is predictable
>
>ISTM that by the nature of things the most important capability of an
>optimizer is to yield optimal results.

One problem, though, is that the optimization problem's even more
intractable for database systems than it is for compilers. Large and
growing database installations go through an evolutionary process of
adding memory, spindles, partioning of data, etc. Changing from the
"-B" default to "-B 2000", as I've done on my web site, causes the
database to live in shared memory even when other activity on the
system would tend to cause some of it to be flushed from the filesystem
cache. This changes how long, on average, it takes to read a block.

And the first time a table's referenced is always going to take longer
than subsequent references if there's caching involved...so in
theory the optimizer should take that into account if it makes presumptions
about cache hit ratios.

So forth and so forth.

I'm not disagreeing in the least with the spirit of your comment. There
are all sorts of practical barriers...

Thankfully, for the particular problem of choosing between and index
vs. sequential scan, the optimizer only has two choices to make. Thus
a rough-and-ready heuristic based on certain assumptions might work well,
which is essentially what the optimizer does today. In fact, it does
work quite well, come to think of it!

Fiddling the numbers underlying the assumptions may be good enough for
this task.

> This, however, does not have to be
>mutually exclusive with predictability. If you estimate some CPU and disk
>parameters and write them into a config file, then you can always give
>this config file to a bug fixer. It will still work on his machine, just
>less than optimally.
>
>> 2. I compile on test machine, production is completely different
>> (more processors, faster disks and controllers)
>
>You're completely right. This has no place in configure. It will have to
>be a separate tool which you can run after building and installing.

Based perhaps on statistics gathered while the system is running...

- Don Baccus, Portland OR <dhogaza(at)pacifier(dot)com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-01-25 23:55:59 Re: Happy column adding (was RE: [HACKERS] Happy column dropping)
Previous Message Peter Eisentraut 2000-01-25 23:16:15 Re: [HACKERS] Some notes on optimizer cost estimates