From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Christopher Browne <cbbrowne(at)acm(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Seqscan in MAX(index_column) |
Date: | 2003-09-05 02:02:45 |
Message-ID: | 200309050202.h8522jr06156@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Christopher Browne wrote:
> > IMHO portability is an important point. People are used to MAX() and
> > COUNT(*), and will be surprised that they need some special
> > treatment. While the reasons for this are perfectly explainable,
> > speeding up these aggregates with some extra effort would make porting
> > a bit easier.
>
> The availability of cleverness with MAX()/MIN() is no grand surprise;
> it would be very nice to get some expansion of that to "SELECT VALUE
> FROM TABLE WHERE (CRITERIA) ORDER BY VALUE DESCENDING LIMIT 1;"
>
> But I'm _very_ curious as to what the anticipated treatment to collect
> COUNT() more efficiently would be. I would expect that it would only
> be able to get tuned much more if there's NO "where" clause, so that
> it could use some ("magically-kept-up-to-date") stats on table size.
>
> I don't see any way to optimize COUNT when numbers of rows can
> continually vary. Storing stats somewhere will just make updates more
> expensive. And if those stats are for the table, that doesn't help me
> if I want "COUNT(*) FROM TABLE WHERE UPDATED_ON BETWEEN NOW() - '1
> day' and NOW()".
Yes, count would only use the cached stats for non-WHERE clause
COUNT(*).
My idea is that if a transaction doing a COUNT(*) would first look to
see if there already was a visible cached value, and if not, it would do
the COUNT(*) and insert into the cache table. Any INSERT/DELETE would
remove the value from the cache. As I see it, the commit of the
INSERT/DELETE transaction would then auto-invalidate the cache at the
exact time the transaction commits. This would allow MVCC visibility of
the counts.
A trickier idea would be for INSERT/DELETE to UPDATE the cached value.
It might be possible to always have a valid cache value for COUNT(*).
(COPY would also need to update the cache.)
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Neil Conway | 2003-09-05 02:16:50 | Re: Seqscan in MAX(index_column) |
Previous Message | Tom Lane | 2003-09-05 01:51:59 | Re: TCP/IP with 7.4 beta2 broken? |