Skip site navigation (1) Skip section navigation (2)

Re: Select max(foo) and select count(*) optimization

From: Paul Tuckfield <paul(at)tuckfield(dot)com>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Select max(foo) and select count(*) optimization
Date: 2004-01-06 00:24:26
Message-ID: 1073348666.19994.3.camel@localhost.localdomain (view raw or flat)
Thread:
Lists: pgsql-performance
Not that I'm offering to do the porgramming mind you, :) but . . 


In the case of select count(*), one optimization is to do  a scan of the
primary key, not the table itself, if the table has a primary key. In a
certain commercial, lesser database, this is called an "index fast full
scan".  It would be important to scan the index in physical order
(sequential physical IO) and not in key order (random physical IO)

I'm guessing the payoff as well as real-world-utility of a max(xxx)
optimization are much higher than a count(*) optimization tho


On Mon, 2004-01-05 at 12:26, Christopher Browne wrote:
> Oops! siracusa(at)mindspring(dot)com (John Siracusa) was seen spray-painting on a wall:
> > Speaking of special cases (well, I was on the admin list) there are two
> > kinds that would really benefit from some attention.
> >
> > 1. The query "select max(foo) from bar" where the column foo has an
> > index.  Aren't indexes ordered?  If not, an "ordered index" would be
> > useful in this situation so that this query, rather than doing a
> > sequential scan of the whole table, would just "ask the index" for
> > the max value and return nearly instantly.
> >
> > 2. The query "select count(*) from bar" Surely the total number of
> > rows in a table is kept somewhere convenient.  If not, it would be
> > nice if it could be :) Again, rather than doing a sequential scan of
> > the entire table, this type of query could return instantly.
> >
> > I believe MySQL does both of these optimizations (which are probably
> > a lot easier in that product, given its data storage system).  These
> > were the first areas where I noticed a big performance difference
> > between MySQL and Postgres.
> >
> > Especially with very large tables, hearing the disks grind as
> > Postgres scans every single row in order to determine the number of
> > rows in a table or the max value of a column (even a primary key
> > created from a sequence) is pretty painful.  If the implementation
> > is not too horrendous, this is an area where an orders-of-magnitude
> > performance increase can be had.
> 
> These are both VERY frequently asked questions.
> 
> In the case of question #1, the optimization you suggest could be
> accomplished via some Small Matter Of Programming.  None of the people
> that have wanted the optimization have, however, offered to actually
> DO the programming.
> 
> In the case of #2, the answer is "surely NOT."  In MVCC databases,
> that information CANNOT be stored anywhere convenient because queries
> requested by transactions started at different points in time must get
> different answers.
> 
> I think we need to add these questions and their answers to the FAQ so
> that the answer can be "See FAQ Item #17" rather than people having to
> gratuitously explain it over and over and over again.


In response to

Responses

pgsql-performance by date

Next:From: Doug McNaughtDate: 2004-01-06 00:29:56
Subject: Re: Select max(foo) and select count(*) optimization
Previous:From: Stephan SzaboDate: 2004-01-05 22:33:17
Subject: Re: deferred foreign keys

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group