Re: Enabling and Disabling Sequencial Scan

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Enabling and Disabling Sequencial Scan
Date: 2003-06-02 15:20:28
Message-ID: Pine.LNX.4.33.0306020857110.11997-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 30 May 2003, Robert Creager wrote:

> On Fri, 30 May 2003 14:46:12 -0600 (MDT)
> "scott.marlowe" <scott(dot)marlowe(at)ihs(dot)com> said something like:
>
> >
> > level cache is. On my machine it's about 800 meg. It's measured in 8k > blocks, so 100,000 * 8k ~ 800 meg. The smaller this is, the more
>
> Any thoughts on how to figure this out (disk buffer size)? For some
> reason, my system (2xAMD 2800+, 2Gb RAM 2.4.21 - /proc/meminfo) only
> shows a usage of 88kb of 'Buffers' usage, and that never changes. My
> 'Cached' usage is 1.7Gb. I've hit the kernel mailing list, and the one
> response I got said don't worry about it :-(

Are you sure that's not 88213Kb or so of buffers? 88kb is awfully small.

It's normal to have a cache size many times larger than the buffer size.
Buffers are assigned to individual disks, and sit under the larger single
pool that is the cache.

I just take the approximate size of the cache under load and use that for
the effective_cache_size. Since it's pretty much a "fudge factor"
variable anyway.

P.s. My use of the term fudge factor here is in no way meant to be
derogatory. It's just that as long as the effective cache size is within
some reasonable range of the actual cache/buffer in the machine, it'll be
close enough to push the query planner in the right direction.

Note that you can adopt two philosophies on the planner. One is that the
planner will always make certain mistakes, and you've got to fool it in
order to get the right query plan.

The other philosophy is that you give the query planner all the variables
you can reasonably give it to let it decide the proper course of action,
and you fine tune each one so that eventually it makes the right choice
for all the querys you throw at it.

While the first philosophy provides for the fastest functional solutions
on a static platform (i.e. we're running postgresql 7.0.2 and aren't going
to upgrade.) but it kind of supports the idea that the query planner can
never be given the right information and programmed with the right code to
make the right decision 99% of the time, and when it makes the wrong
decision, it's only a little wrong.

The second choice will require you to spend more time fine tuning all the
parameters fed to the query planner with your own queries using explain
analyze and repeated testing with different settings.

What I look for are the corner cases. I.e. if I do some select that
returns 500 records with a seq scan, and it takes 5 seconds, and with 450
records it switches to index scan and takes 1 second, then likely the
planner is choosing to switch to seq scans too quickly when I raise the
result size from 450 to 500.

At this point use the set seq_scan option to test the database
performance with it on and off and increasing set size.

Somewhere around 2,000 or so in this scenario, we'll notice that the seq
scan has now the same speed as the index scan, and as we raise the number
of rows we are getting, the index scan would now be slower than the seq
scan.

Assuming we set effective_cache_size right at the beginning, we now can
turn seq_scan back on, and adjust the default cost options until the
planner chooses a seq scan at the break point we found (in our imaginary
case of 2000). It doesn't have to be perfect, since the performance at or
around the break point is similar for index and seq scans alike.

Then, throw the next query at it and see how it does.

I've found that on fast machines, it's good to lower the cpu costs,
especially the index one. I usually drop these by a divisor of 2 to 10.
For the random_page_cost, settings of 1.x to 2.x seem a good choice for
fast I/O subsystems.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message scott.marlowe 2003-06-02 15:23:11 Re: Enabling and Disabling Sequencial Scan
Previous Message Tom Lane 2003-06-02 13:36:31 Re: Degrading performance