Re: Seqscan in MAX(index_column)

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Seqscan in MAX(index_column)
Date: 2003-09-05 02:02:45
Message-ID: 200309050202.h8522jr06156@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Christopher Browne wrote:
> > IMHO portability is an important point. People are used to MAX() and
> > COUNT(*), and will be surprised that they need some special
> > treatment. While the reasons for this are perfectly explainable,
> > speeding up these aggregates with some extra effort would make porting
> > a bit easier.
>
> The availability of cleverness with MAX()/MIN() is no grand surprise;
> it would be very nice to get some expansion of that to "SELECT VALUE
> FROM TABLE WHERE (CRITERIA) ORDER BY VALUE DESCENDING LIMIT 1;"
>
> But I'm _very_ curious as to what the anticipated treatment to collect
> COUNT() more efficiently would be. I would expect that it would only
> be able to get tuned much more if there's NO "where" clause, so that
> it could use some ("magically-kept-up-to-date") stats on table size.
>
> I don't see any way to optimize COUNT when numbers of rows can
> continually vary. Storing stats somewhere will just make updates more
> expensive. And if those stats are for the table, that doesn't help me
> if I want "COUNT(*) FROM TABLE WHERE UPDATED_ON BETWEEN NOW() - '1
> day' and NOW()".

Yes, count would only use the cached stats for non-WHERE clause
COUNT(*).

My idea is that if a transaction doing a COUNT(*) would first look to
see if there already was a visible cached value, and if not, it would do
the COUNT(*) and insert into the cache table. Any INSERT/DELETE would
remove the value from the cache. As I see it, the commit of the
INSERT/DELETE transaction would then auto-invalidate the cache at the
exact time the transaction commits. This would allow MVCC visibility of
the counts.

A trickier idea would be for INSERT/DELETE to UPDATE the cached value.
It might be possible to always have a valid cache value for COUNT(*).
(COPY would also need to update the cache.)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2003-09-05 02:16:50 Re: Seqscan in MAX(index_column)
Previous Message Tom Lane 2003-09-05 01:51:59 Re: TCP/IP with 7.4 beta2 broken?