Re: Index of a table is not used (in any case)

From: "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index of a table is not used (in any case)
Date: 2001-10-25 14:24:25
Message-ID: 46C15C39FEB2C44BA555E356FBCD6FA41EB3D9@m0114.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane writes:
> "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at> writes:
> > Imho one of the biggest sources for problems is people creating new
> > indexes on populated tables when the rest of the db/table has badly
> > outdated statistics or even only default statistics in place.
> > In this situation the optimizer is badly misguided, because it now
> > sees completely inconsistent statistics to work on.
> > (e.g. old indexes on that table may seem way too cheap compared
> > to table scan)
>
> I don't think any of this is correct. We don't have per-index
> statistics. The only stats updated by CREATE INDEX are the same ones
> updated by plain VACUUM, viz the number-of-tuples and number-of-pages
> counts in pg_class.

1. Have I said anything about other stats, than relpages and reltuples ?

2. There is only limited use in the most accurate pg_statistics if
reltuples
and relpages is completely off. In the current behavior you eg get:

rel1: pages = 100000 -- updated from "create index"
index1 pages = 2 -- outdated
index2 pages = 2000 -- current

rel2: pages = 1 -- outdated

--> Optimizer will prefer join order: rel2, rel1

> I believe it's reasonable to update those stats
> more often than the pg_statistic stats (in fact, if we could keep them
> constantly up-to-date at a reasonable cost, we'd do so).

There is a whole lot of difference between keeping them constantly up to

date and modifying (part of) them in the "create index" command, so I do

not counter your above sentence, but imho the conclusion is wrong.

> The
> pg_statistic stats are designed as much as possible to be independent
> of the absolute number of rows in the table, so that it's okay if they
> are out of sync with the pg_class stats.

Independently, they can only be good for choosing whether to use an
index or seq scan. They are not sufficient to choose a good join order.

> The major reason why "you vacuumed but you never analyzed" is such a
> killer is that in the absence of any pg_statistic data, the default
> selectivity estimates are such that you may get either an index or seq
> scan depending on how big the table is. The cost estimates are
> nonlinear (correctly so, IMHO, though I wouldn't necessarily
> defend the
> exact shape of the curve) and ye olde default 0.01 will give you an
> indexscan for a small table but not for a big one. In 7.2 I have
> reduced the default selectivity estimate to 0.005, for a number of
> reasons but mostly to get it out of the range where the decision will
> flip-flop.

Yes, the new selectivity is better, imho even still too high.
Imho the strategy should be to assume a good selectivity
of values in absence of pg_statistics evidence.
If the index was not selective enough for an average query, the
dba should not have created the index in the first place.

> test71=# create table foo (f1 int);
> test71=# create index fooi on foo(f1);
> test71=# explain select * from foo where f1 = 42;

> Index Scan using fooi on foo (cost=0.00..8.14 rows=10 width=4)

> test71=# update pg_class set reltuples = 100000, relpages =
> 1000 where relname = 'foo';
> Index Scan using fooi on foo (cost=0.00..1399.04 rows=1000 width=4)

> test71=# update pg_class set reltuples = 1000000, relpages =
> 10000 where relname = 'foo';

> Seq Scan on foo (cost=0.00..22500.00 rows=10000 width=4)

> In current sources you keep getting an indexscan as you increase the
> number of tuples...

As you can see it toppeled at 10 Mio rows :-(

Andreas

Browse pgsql-hackers by date

  From Date Subject
Next Message tweekie 2001-10-25 14:46:58 java virtual machine
Previous Message Hannu Krosing 2001-10-25 14:13:58 Re: timeout for "idle in transaction"