Re: Not using index

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Bas Scheffers <bas(at)scheffers(dot)net>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Not using index
Date: 2004-02-12 15:23:46
Message-ID: Pine.LNX.4.33.0402120810140.4413-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 12 Feb 2004, Bas Scheffers wrote:

> Hi,
>
> I have a table with about 100K rows, on which I have created a btree index
> of the type table_name(int, int, int, timestamp).
>
> At first postgres was using it for my AND query on all four columns, but
> after dropping it and creating different ones and testing, it suddenly
> stopped using it. Vaccuuming, reindexing, recreating the table and even
> recreating the database all didn't help.
>
> Then I discovered "set enable_seqscan to off". It started using the index
> again. This is certainly not a case of it not being beneficial to use the
> index; using the index drops the query time from 260ms to 36ms!
>
> In most references I have seen, setting enable_seqscan to off is something
> only to be done for testing. But obviously if Postgres doesn't start
> behaving properly, I can't go to production without it. Is this the case
> or do many use this feature in production?
>
> Can you give Postgres index hints like you can do in Oracle and Sybase?

Ok, there are two cases where postgresql doesn't use an index but should.

One is where, due to type mismatch, it can't, the other is where the
planner thinks it will be slower, but in fact would be faster.

the set enable_seqscan = off trick allows you to see which it is. Since
postgresql then used the index, it must be capable, but just doesn't
realize it should.

There are a few settings that tell postgresql how to decide which to use,
they are:

(All the following are in terms of comparison to a sequencial page access)

random_page_cost:
This one tells the server how much a random access costs versus a
sequential access. default of 4. On machines with lots of cache / fast
drives / lots of drives in a RAID array / RAID array with cache (i.e.
things that speed up random access) you may want to set this lower. I run
mine at 1.4. anything below 1.0 is unrealistic, but may be necessary to
force the right plan sometimes. As a global setting, I'd recommend
something between 1 and 2 for most servers.

effective_cache_size:
This tells the planner how big the kernel cache being used for caching
postgresql is. If postgresql is the only thing on the machine, then it is
likely that all the kernel cache is being used for postgresql. higher
favors index scans, since it's more likely the data will be in memory,
where random and seq cost the same, 1.0

cpu_tuple_cost < each tuple retrieved
cpu_index_tuple_cost < each tuple's (additional?) cost for an index
Lowering this favors index scans.
cpu_operator_cost < not sure, I think it's stuff like nestloop loop cost
and such.

So, to start with, try changing random page cost. you can change it for
the current session only for testing, and try to find the "breakover
point" where it forces the planner to make the right decision.

Also, keep a large table around you can do a select * from bigtable to
clear the caches and then run the original query, and compare the
performance of seq versus index. you'll often find that a query that
screams when the caches are full of your data is quite slow when the cache
is empty.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message scott.marlowe 2004-02-12 15:26:35 Re: Migrating 7.1.3 to 7.3.4 problem
Previous Message scott.marlowe 2004-02-12 15:09:12 Re: I want to use postresql for this app, but...