Re: Select max(foo) and select count(*) optimization

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Select max(foo) and select count(*) optimization
Date: 2004-01-05 20:26:15
Message-ID: m31xqef8go.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Oops! siracusa(at)mindspring(dot)com (John Siracusa) was seen spray-painting on a wall:
> Speaking of special cases (well, I was on the admin list) there are two
> kinds that would really benefit from some attention.
>
> 1. The query "select max(foo) from bar" where the column foo has an
> index. Aren't indexes ordered? If not, an "ordered index" would be
> useful in this situation so that this query, rather than doing a
> sequential scan of the whole table, would just "ask the index" for
> the max value and return nearly instantly.
>
> 2. The query "select count(*) from bar" Surely the total number of
> rows in a table is kept somewhere convenient. If not, it would be
> nice if it could be :) Again, rather than doing a sequential scan of
> the entire table, this type of query could return instantly.
>
> I believe MySQL does both of these optimizations (which are probably
> a lot easier in that product, given its data storage system). These
> were the first areas where I noticed a big performance difference
> between MySQL and Postgres.
>
> Especially with very large tables, hearing the disks grind as
> Postgres scans every single row in order to determine the number of
> rows in a table or the max value of a column (even a primary key
> created from a sequence) is pretty painful. If the implementation
> is not too horrendous, this is an area where an orders-of-magnitude
> performance increase can be had.

These are both VERY frequently asked questions.

In the case of question #1, the optimization you suggest could be
accomplished via some Small Matter Of Programming. None of the people
that have wanted the optimization have, however, offered to actually
DO the programming.

In the case of #2, the answer is "surely NOT." In MVCC databases,
that information CANNOT be stored anywhere convenient because queries
requested by transactions started at different points in time must get
different answers.

I think we need to add these questions and their answers to the FAQ so
that the answer can be "See FAQ Item #17" rather than people having to
gratuitously explain it over and over and over again.
--
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/finances.html
Rules of the Evil Overlord #127. "Prison guards will have their own
cantina featuring a wide variety of tasty treats that will deliver
snacks to the guards while on duty. The guards will also be informed
that accepting food or drink from any other source will result in
execution." <http://www.eviloverlord.com/>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Browne 2004-01-05 20:27:43 Re: Use my (date) index, darn it!
Previous Message Neil Conway 2004-01-05 20:23:16 Re: Select max(foo) and select count(*) optimization