Re: AW: More Performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at>
Cc: "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: AW: More Performance
Date: 2000-05-24 15:44:54
Message-ID: 7488.959183094@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at> writes:
> There was lots of discussion about this issue, and I was one of those
> who are 100% with you. The result was that Tom Lane tried to provide
> defaults, that do use the indexes when stats are missing. This does work
> quite well, only it seems to fail in this particular case.
> Imho we should look at exactly this query and try to find why it ignores
> the index, because it should not.

IIRC, the issue was that he had done VACUUM but not VACUUM ANALYZE,
with the result that the planner was working with default selectivity
estimates but non-default knowledge of the table size.

The default selectivity estimate for '=' is 0.01 (ie, 1% of the table
is expected to match). The MySQL benchmark creates a table that
has 300,000 rows and occupies about 3000 disk pages (24Mb).

With the current cost model for indexscans, the planner does not believe
that an indexscan is a good way to select 1% of the data in this table
--- and I think it's probably right, if you assume that the matching
rows are randomly distributed. Since there are about 100 rows per disk
block, there is going to be about one matching row per block, implying
that the indexscan is going to have to visit most of the table's pages
anyway. It's faster to do a seqscan because sequential reads are way
faster than random reads in a typical Unix environment.

We could ensure that the planner still picks an indexscan as the known
table size grows by reducing the default selectivity estimate a little
bit (I experimented and determined that 0.005 would do the trick, for
the current cost model parameters). That's pretty ad-hoc, but then
the 0.01 number is pretty ad-hoc too. It's probably better to be able
to say "if you haven't done VACUUM ANALYZE, you will get an indexscan
from WHERE col = const" than to have to say "it depends". Comments?

Of course the real issue here is that the true selectivity of '=' is
much smaller in this table, because the column being looked at is
unique. But the planner doesn't know that without VACUUM stats.

A hack I have been thinking about adding is that CREATE UNIQUE INDEX
for a single-column index should immediately force the attdisbursion
value for that column to "unique", so that the planner would know the
column is unique even without VACUUM ANALYZE. That would help not
at all for the MySQL benchmark (it has a two-column unique index,
but you can't conclude anything about the properties of either column
individually from that :-(). But it'd help in a lot of real-world
scenarios.

Comments anyone?

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vince Vielhaber 2000-05-24 16:02:37 Re: [HACKERS] Re: interactive pgsql book
Previous Message gomathi raju 2000-05-24 15:37:27