Skip site navigation (1) Skip section navigation (2)

Re: Select max(foo) and select count(*) optimization

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Select max(foo) and select count(*) optimization
Date: 2004-01-06 02:31:29
Message-ID: m3r7yddczi.fsf@wolfe.cbbrowne.com (view raw or flat)
Thread:
Lists: pgsql-performance
pg(at)rbt(dot)ca (Rod Taylor) wrote:
>> Especially with very large tables, hearing the disks grind as Postgres scans
>> every single row in order to determine the number of rows in a table or the
>> max value of a column (even a primary key created from a sequence) is pretty
>> painful.  If the implementation is not too horrendous, this is an area where
>> an orders-of-magnitude performance increase can  be had.
>
> Actually, it's very painful. For MySQL, they've accepted the concurrancy
> hit in order to accomplish it -- PostgreSQL would require a more subtle
> approach.
>
> Anyway, with Rules you can force this:
>
> ON INSERT UPDATE counter SET tablecount = tablecount + 1;
>
> ON DELETE UPDATE counter SET tablecount = tablecount - 1;
>
> You need to create a table "counter" with a single row that will keep
> track of the number of rows in the table. Just remember, you've now
> serialized all writes to the table, but in your situation it may be
> worth while.

There's a still more subtle approach that relieves the serialization
constraint, at some cost...

- You add rules that _insert_ a row each time there is an
  insert/delete
   ON INSERT insert into counts(table, value) values ('our_table', 1);
   ON DELETE insert into counts(table, value) values ('our_table', -1);

- The "select count(*) from our_table" is replaced by "select
  sum(value) from counts where table = 'our_table'"

- Periodically, a "compression" process goes through and either:

    a) Deletes the rows for 'our_table' and replaces them with one
       row with a conventionally-scanned 'count(*)' value, or

    b) Computes "select table, sum(value) as value from counts group
       by table", deletes all the existing rows in counts, and replaces
       them by the preceding selection, or

    c) Perhaps does something incremental that's like b), but which
       only processes parts of the "count" table at once.  Process
       500 rows, then COMMIT, or something of the sort...

Note that this "counts" table can potentially grow _extremely_ large.
The "win" comes when it gets compressed, so that instead of scanning
through 500K items, it index-scans through 27, the 1 that has the
"497000" that was the state of the table at the last compression, and
then 26 singletons.

A win comes in if an INSERT that adds in 50 rows can lead to
inserting ('our_table', 50) into COUNTS, or a delete that eliminates
5000 rows puts in ('our_table', -5000).

It's vital to run the "compression" reasonably often (much like VACUUM
:-)) in order that the COUNTS summary table stays relatively small.
-- 
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/wp.html
Debugging is twice  as hard as writing   the code in the first  place.
Therefore, if you write the code as cleverly as  possible, you are, by
definition, not smart enough to debug it.  -- Brian W. Kernighan

In response to

pgsql-performance by date

Next:From: Bruce MomjianDate: 2004-01-06 05:25:32
Subject: Re: optimizing Postgres queries
Previous:From: Christopher BrowneDate: 2004-01-06 01:46:00
Subject: Re: Select max(foo) and select count(*) optimization

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group