Re: [HACKERS] Costs: Index vs Non-Index

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: The Hermit Hacker <scrappy(at)hub(dot)org>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Costs: Index vs Non-Index
Date: 2000-01-08 16:15:16
Message-ID: 6112.947348116@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The Hermit Hacker <scrappy(at)hub(dot)org> writes:
> Hrmmm...if I'm reading this right, its more costly to create an index then
> to leave it as a sequential scan, but it returns more rows? Yet, it
> returns, if I do the query with a count() around the return value, 288
> rows, not 334 or 1154...

This doesn't have anything to do with index vs sequential scan, but it
does have to do with whether you've done a VACUUM ANALYZE lately.
You haven't ;-)

> udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
> NOTICE: QUERY PLAN:
> Seq Scan on url (cost=43.00 rows=334 width=4)

IIRC, rows=334 is the default estimate of result rows you will get for
this query in the absence of any information whatever. (Default table
size guess is 1000 rows, and default selectivity guess for <= is 1/3,
so...) If you have not vacuumed, it's sheer coincidence that this is
even within hailing distance of the correct figure of 288.

> udmsearch=> create index url_next_index_time on url using btree ( next_index_time);
> CREATE
> udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
> NOTICE: QUERY PLAN:
> Index Scan using url_next_index_time on url (cost=271.68 rows=1154 width=4)

I believe that a side-effect of CREATE INDEX is to update the
number-of-pages-and-rows statistics in pg_class for the target table.
So after you do that, the optimizer has a correct idea of the table's
size, but still no more info about the selectivity of the WHERE clause.
(I infer that your table has size 1154*3 rows.) If you now drop the
index and repeat EXPLAIN, it'll go back to a seq scan, but it will now
say 1154 rows --- and the cost estimate will be higher, too.

If you do VACUUM ANALYZE, then the optimizer will also know the min and
max values of next_index_time, and will have some shot at making a
correct estimate of the output row count. I'd be interested to know
what it predicts then...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-01-08 16:23:17 Re: [HACKERS] Another index "buglet"?
Previous Message Bruce Momjian 2000-01-08 16:10:01 Re: [HACKERS] ECPG patch for exec sql ifdef etc.