Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Optimizer fails?

From: dg(at)illustra(dot)com (David Gould)
To: mimo(at)interdata(dot)com(dot)pl (Michal Mosiewicz)
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Optimizer fails?
Date: 1998-03-25 05:48:59
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Michal Mosiewicz asks:
> xxxx=> \d logdt
> Table    = logdt
> +----------------------------------+----------------------------------+-------+
> |              Field               |              Type                |
> Length|
> +----------------------------------+----------------------------------+-------+
> | dt                               | timestamp                       
> |     4 |
> +----------------------------------+----------------------------------+-------+
> xxxx=> explain select * from log where dt < '19980203';
> Seq Scan on log  (cost=105832.15 size=699588 width=62)
> There is an index on log table, dt field. The index is b-tree.
> However it doesn't seem to be used. (Of course I have vacuumed it). The
> table has about 2M records. I don't think that Seq Scan is a good idea.

This is almost certainly correct optimizer behavior, unless there are very few
rows that would be qualified by "< '19980203'". Do you know how many rows
were returned?

The key to all this is that scanning sequentially is very cheap. Disks like
it, the OS filesystem is optimized for it, striping across multiple devices
is a big win... It is easy to scan 10 Mb per second on PC class hardward.

Using an index means using random I/O for each row. Unless the target table
fits in the buffer cache this means you are limited to examining a few
dozen rows per second (good disks can sustain about 80 to 100 random IOs
per second).

Back of the envelope calculation:


     rowsize = 100 bytes.
     Sequential I/O rate = 1Mb/second.
     Random I/O rate = 50 IO/second.


     2M rows @ 10,000 rows per Mb = 200 MB = 200 seconds.

     200 seconds * 50 IO per second = 10,000 IOs


     If less than 10,000 rows (0.5 % of the table) would qualify, the index
     scan might be faster.  Otherwise the table scan is optimal.

This calculation ignores the overhead of actually scanning the index and
probably underestimates the sequential I/O rate a so in real life, the table
scan is may be  even better than this.

> Also, what if I agregate it on dt field to count(*) or sum some values.
> It would be sequentially scanned, then sorted, then grouped and finally
> agregated, right?
> Well, sometimes it may be good enough. But if this table is big enough,
> it would be wiser to use index to select ranges from the table and then
> agregate those values without sorting. 

Assuming a table and query like:

   create table foo (d date, m money);  - 2 M rows of this, index on d.

   select d, sum(m) from foo
      group by d;

I would be very surprised if there existed a better plan than to sort the
table and then scan the sorted table (once) to do the grouping and

If you knew the number of 'd' values was smallish, you might try scanning
the table and building a hash containing the d values and aggregates.

I don't know if postgreSQL has this strategy. It is fairly uncommon to
actually find it implemented in real systems.

> Once I saw index based agregates in the TODO list, but somehow it
> disappeared.

Do you mean using covering indexes? This would be very worthwhile, but might
be a bit of a job.


David Gould            dg(at)illustra(dot)com           510.628.3783 or 510.305.9468 
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - Linux. Not because it is free. Because it is better.

In response to

pgsql-hackers by date

Next:From: t-ishiiDate: 1998-03-25 06:11:39
Subject: using alternative tcl include dir?
Previous:From: Bruce MomjianDate: 1998-03-25 05:33:15
Subject: Re: [HACKERS] Optimizer fails?

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group