Re: Index of a table is not used (in any case)

From: mlw <markw(at)mohawksoft(dot)com>
To: Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>
Cc: Doug McNaught <doug(at)wireboard(dot)com>, Reiner Dassing <dassing(at)wettzell(dot)ifag(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index of a table is not used (in any case)
Date: 2001-10-24 12:59:34
Message-ID: 3BD6BB36.6B85376E@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zeugswetter Andreas SB SD wrote:
>
> > > > *very* slow, due to seq scan on
> > > > 20 million entries, which is a test setup up to now)
> > >
> > > Perennial first question: did you VACUUM ANALYZE?
> >
> > Can there, or could there, be a notion of "rule based" optimization of
> > queries in PostgreSQL? The "not using index" problem is probably the
> most
> > common and most misunderstood problem.
>
> There is a (sort of) rule based behavior in PostgreSQL,
> the down side of the current implementation is, that certain
> other commands than ANALYZE (e.g. "create index") partly update
> optimizer statistics. This is bad behavior, since then only part
> of the statistics are accurate. Statistics always have to be seen
> in context to other table's and other index'es statistics.
>
> Thus, currently the rule based optimizer only works if you create
> the indexes on empty tables (before loading data), which obviously
> has downsides. Else you have no choice but to ANALYZE frequently.
>
> I have tried hard to fight for this pseudo rule based behavior,
> but was only partly successful in convincing core. My opinion is,
> that (unless runtime statistics are kept) no other command than
> ANALYZE should be allowed to touch optimizer relevant statistics
> (maybe unless explicitly told to).

Perhaps there could be an extension to ANALYZE, i.e. ANALYZE RULEBASED
tablename that would restore or recalculate the state that a table would be if
all indexes were created from scratch?

The "not using index" was very frustrating to understand. The stock answer,
"did you vacuum?" just isn't enough. There has to be some explanation (in the
FAQ or something) about the indexed key distribution in your data. Postgres'
statistics are pretty poor too, a relative few very populous entries in a table
will make it virtually impossible for the cost based optimizer (CBO) to use an
index.

At my site we have lots of tables that have many duplicate items in an index.
It is a music based site and has a huge amount of "Various Artists" entries. No
matter what we do, there is NO way to get Postgres to use the index from the
query alone. We have over 20 thousand artists, but 5 "Various Artists" or
"Soundtrack" entries change the statistics so much that they exclude an index
scan. We have to run the system with sequential scan disabled. Running with seq
disabled eliminates the usefulness of the CBO because when it is a justified
table scan, it does an index scan.

I have approached this windmill before and a bit regretful at bringing it up
again, but it is important, very important. There needs to be a way to direct
the optimizer about how to optimize the query.

Using "set foo=bar" prior to a query is not acceptable. Web sites use
persistent connections to the databases and since "set" can not be restored,
you override global settings for the session, or have to code, in the web page,
the proper default setting. The result is either that different web processes
will behave differently depending on the order in which they execute queries,
or you have to have your DBA write web pages.

A syntax like:

select * from table where /* enable_seqscan = false */ key = 'value';

Would be great in that you could tune the optimizer as long as the settings
were for the clause directly following the directive, without affecting the
state of the session or transaction. For instance:

select id from t1, t2 where /* enable_seqscan = false */ t1.key = 'value' and
t2.key = 'test' and t1.id = t2.id;

The where "t1.key = 'value'" condition would be prohibited from using a
sequntial scan, while the "t2.key = 'test'" would use it if it made sense.

Is this possible?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kiran Kumar Gahlot 2001-10-24 13:01:49 check disk space
Previous Message Vince Vielhaber 2001-10-24 12:01:04 between?