Re: Index size

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Ioannis Theoharis <theohari(at)ics(dot)forth(dot)gr>
Cc: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-general(at)postgresql(dot)org
Subject: Re: Index size
Date: 2005-03-02 20:55:52
Message-ID: 20050302205535.GA12207@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Mar 02, 2005 at 10:08:58PM +0200, Ioannis Theoharis wrote:
> I have a relation like this: (att0 varchar(1000), att1 int4)
>
> i create a b-tree index on att1 ()
> i cluster my raltion according to index
>
> now i have a query
> select *
> form tc2000000000
> where att1<=900000000 and att1>=0 ;
>
> As far as i can see from explain analyze an index scan is used:
> Index Scan using inst_id_idx on tc2000000000
> Index Cond: ((att1 <= 900000000) AND (att1 >= 0))
>
> If for each entry in table, an entry in index is beeing held, then the
> index size is populated too fast.
>
> I guess, that postgres uses index to find the first entry satisfying the
> index conition, after find the last one and then do a sequential scan on
> the appropriate fraction of the table (to take advantage of physical
> clustering).

What makes you think that? Clustering is nice, but postgresql needs to
get the right answer and that the table in clustered is not something
postgresql can rely on. It uses the index to find *every* row you're
looking for, there's no shortcut here.

> In my case, discrete values on att1 are orders of magnitude less than
> number of table raws.
>
> Thus, the big index size is useless for me. I want to avoid the overhead
> of scanning such a big index, just permitting ONLY the discrete values to
> entry in index. In such a way the whole scenario i presented before for
> how i guess, that postgres evaluates my query, is still in use.

There's no special relationship between two rows with the same att1.
Either you find the rows by using an index for each row, or scanning
the whole table. There's no inbetween. The only thing clustering
acheives is that due to values being together, the chance that
succeeding indexes entries will already have been loaded is higher,
thus reducing the overall cost.
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Eisentraut 2005-03-02 20:56:52 Re: pgpool
Previous Message Arcane_Rhino 2005-03-02 20:10:42 Fwd: Re: [GENERAL] pg_shadow passwd decrypt