Skip site navigation (1) Skip section navigation (2)

Re: Select max(foo) and select count(*) optimization

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Select max(foo) and select count(*) optimization
Date: 2004-01-05 20:26:15
Message-ID: m31xqef8go.fsf@wolfe.cbbrowne.com (view raw or flat)
Thread:
Lists: pgsql-performance
Oops! siracusa(at)mindspring(dot)com (John Siracusa) was seen spray-painting on a wall:
> Speaking of special cases (well, I was on the admin list) there are two
> kinds that would really benefit from some attention.
>
> 1. The query "select max(foo) from bar" where the column foo has an
> index.  Aren't indexes ordered?  If not, an "ordered index" would be
> useful in this situation so that this query, rather than doing a
> sequential scan of the whole table, would just "ask the index" for
> the max value and return nearly instantly.
>
> 2. The query "select count(*) from bar" Surely the total number of
> rows in a table is kept somewhere convenient.  If not, it would be
> nice if it could be :) Again, rather than doing a sequential scan of
> the entire table, this type of query could return instantly.
>
> I believe MySQL does both of these optimizations (which are probably
> a lot easier in that product, given its data storage system).  These
> were the first areas where I noticed a big performance difference
> between MySQL and Postgres.
>
> Especially with very large tables, hearing the disks grind as
> Postgres scans every single row in order to determine the number of
> rows in a table or the max value of a column (even a primary key
> created from a sequence) is pretty painful.  If the implementation
> is not too horrendous, this is an area where an orders-of-magnitude
> performance increase can be had.

These are both VERY frequently asked questions.

In the case of question #1, the optimization you suggest could be
accomplished via some Small Matter Of Programming.  None of the people
that have wanted the optimization have, however, offered to actually
DO the programming.

In the case of #2, the answer is "surely NOT."  In MVCC databases,
that information CANNOT be stored anywhere convenient because queries
requested by transactions started at different points in time must get
different answers.

I think we need to add these questions and their answers to the FAQ so
that the answer can be "See FAQ Item #17" rather than people having to
gratuitously explain it over and over and over again.
-- 
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/finances.html
Rules of  the Evil Overlord #127.  "Prison guards will  have their own
cantina featuring  a wide  variety of tasty  treats that  will deliver
snacks to the  guards while on duty. The guards  will also be informed
that  accepting food or  drink from  any other  source will  result in
execution." <http://www.eviloverlord.com/>

In response to

Responses

pgsql-performance by date

Next:From: Christopher BrowneDate: 2004-01-05 20:27:43
Subject: Re: Use my (date) index, darn it!
Previous:From: Neil ConwayDate: 2004-01-05 20:23:16
Subject: Re: Select max(foo) and select count(*) optimization

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group