Re: INDEX suggestion needed

From: Thomas Beutin <tyrone(at)laokoon(dot)IN-Berlin(dot)DE>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: INDEX suggestion needed
Date: 2002-12-17 13:40:58
Message-ID: 20021217144058.C13038@laokoon.bug.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

now i'm back on track; had some trouble with the tax at the and of the
year, and this consumes lot of time, most waiting rigth bevor a boring
office :-(

On Fri, Dec 13, 2002 at 06:37:59PM +0100, Manfred Koizar wrote:
> On Fri, 13 Dec 2002 13:00:14 -0300, Alvaro Herrera
> <alvherre(at)dcc(dot)uchile(dot)cl> wrote:
> >Now this catched my attention (in the questions' side, sorry, not the
> >answers'). Why the aggregate takes 10 times the time needed for the
> >indexscan?
>
> Good point!
>
> > One would think that a function like count() should be
> >pretty cheap,
>
> COUNT is cheap. But COUNT(DISTINCT something) is not trivial, it has
> to keep a list of all values it has already counted. I didn't look at
> the implementation. Do we have O(n^2) cost here?
>
> Thomas, could you EXPLAIN ANALYZE some test cases with
>
> SELECT COUNT(*) FROM (
> SELECT DISTINCT a_id
> FROM stat_pages
> WHERE ...
> ) AS x;
>
> and compare them to the results of SELECT COUNT(DISTINCT ...)?
Here we are:

EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-07-01' AND visit <= '2002-07-31');
NOTICE: QUERY PLAN:

Aggregate (cost=3.21..3.21 rows=1 width=34) (actual time=8.67..8.67 rows=1 loops=1)
-> Index Scan using tb5 on stat_pages (cost=0.00..3.21 rows=1 width=34) (actual time=0.08..2.00 rows=164 loops=1)
Total runtime: 8.86 msec

EXPLAIN ANALYZE SELECT COUNT(a_id) FROM ( SELECT DISTINCT a_id FROM stat_pages WHERE m_id = 35::smallint AND visit >= '2002-07-01' AND visit <= '2002-07-31' ) AS foo;
NOTICE: QUERY PLAN:

Aggregate (cost=3.22..3.22 rows=1 width=34) (actual time=9.42..9.42 rows=1 loops=1)
-> Subquery Scan foo (cost=3.22..3.22 rows=1 width=34) (actual time=7.25..9.39 rows=8 loops=1)
-> Unique (cost=3.22..3.22 rows=1 width=34) (actual time=7.24..9.34 rows=8 loops=1)
-> Sort (cost=3.22..3.22 rows=1 width=34) (actual time=7.24..7.66 rows=164 loops=1)
-> Index Scan using tb5 on stat_pages (cost=0.00..3.21 rows=1 width=34) (actual time=0.08..1.92 rows=164 loops=1)
Total runtime: 9.67 msec

EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-08-01' AND visit <= '2002-08-31');
NOTICE: QUERY PLAN:

Aggregate (cost=14103.16..14103.16 rows=1 width=34) (actual time=22864.95..22864.95 rows=1 loops=1)
-> Seq Scan on stat_pages (cost=0.00..13821.19 rows=112784 width=34) (actual time=3664.51..5145.32 rows=113612 loops=1)
Total runtime: 22889.60 msec

EXPLAIN ANALYZE SELECT COUNT(a_id) FROM ( SELECT DISTINCT a_id FROM stat_pages WHERE m_id = 35::smallint AND visit >= '2002-08-01' AND visit <= '2002-08-31' ) AS foo;
NOTICE: QUERY PLAN:

Aggregate (cost=28034.99..28034.99 rows=1 width=34) (actual time=25055.61..25055.61 rows=1 loops=1)
-> Subquery Scan foo (cost=27724.83..28006.79 rows=11278 width=34) (actual time=23467.83..25053.67 rows=555 loops=1)
-> Unique (cost=27724.83..28006.79 rows=11278 width=34) (actual time=23467.81..25050.34 rows=555 loops=1)
-> Sort (cost=27724.83..27724.83 rows=112784 width=34) (actual time=23467.80..23798.54 rows=113612 loops=1)
-> Seq Scan on stat_pages (cost=0.00..13821.19 rows=112784 width=34) (actual time=3936.11..5426.79 rows=113612 loops=1)
Total runtime: 25726.64 msec

EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-09-01' AND visit <= '2002-09-30');
NOTICE: QUERY PLAN:

Aggregate (cost=14293.93..14293.93 rows=1 width=34) (actual time=37803.08..37803.08 rows=1 loops=1)
-> Seq Scan on stat_pages (cost=0.00..13821.19 rows=189093 width=34) (actual time=461.37..5648.66 rows=195265 loops=1)
Total runtime: 37803.25 msec

EXPLAIN ANALYZE SELECT COUNT(a_id) FROM ( SELECT DISTINCT a_id FROM stat_pages WHERE m_id = 35::smallint AND visit >= '2002-09-01' AND visit <= '2002-09-30' ) AS foo;
NOTICE: QUERY PLAN:

Aggregate (cost=40763.98..40763.98 rows=1 width=34) (actual time=42325.52..42325.52 rows=1 loops=1)
-> Subquery Scan foo (cost=40243.98..40716.71 rows=18909 width=34) (actual time=39487.50..42323.18 rows=702 loops=1)
-> Unique (cost=40243.98..40716.71 rows=18909 width=34) (actual time=39487.48..42319.02 rows=702 loops=1)
-> Sort (cost=40243.98..40243.98 rows=189093 width=34) (actual time=39487.47..40079.07 rows=195265 loops=1)
-> Seq Scan on stat_pages (cost=0.00..13821.19 rows=189093 width=34) (actual time=624.73..5814.48 rows=195265 loops=1)
Total runtime: 44169.09 msec

EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-10-01' AND visit <= '2002-10-31');
NOTICE: QUERY PLAN:

Aggregate (cost=8792.18..8792.18 rows=1 width=34) (actual time=5419.84..5419.84 rows=1 loops=1)
-> Index Scan using tb5 on stat_pages (cost=0.00..8708.74 rows=33376 width=34) (actual time=62.75..1520.06 rows=29937 loops=1)
Total runtime: 5420.06 msec

EXPLAIN ANALYZE SELECT COUNT(a_id) FROM ( SELECT DISTINCT a_id FROM stat_pages WHERE m_id = 35::smallint AND visit >= '2002-10-01' AND visit <= '2002-10-31' ) AS foo;
NOTICE: QUERY PLAN:

Aggregate (cost=11812.87..11812.87 rows=1 width=34) (actual time=4755.42..4755.42 rows=1 loops=1)
-> Subquery Scan foo (cost=11721.09..11804.53 rows=3338 width=34) (actual time=4325.68..4753.70 rows=662 loops=1)
-> Unique (cost=11721.09..11804.53 rows=3338 width=34) (actual time=4325.66..4750.65 rows=662 loops=1)
-> Sort (cost=11721.09..11721.09 rows=33376 width=34) (actual time=4325.64..4401.14 rows=29937 loops=1)
-> Index Scan using tb5 on stat_pages (cost=0.00..8708.74 rows=33376 width=34) (actual time=0.21..473.04 rows=29937 loops=1)
Total runtime: 5242.39 msec

> So now you are back where you started. At least you have an index on
> "visit" now ;-)
Yes, but this is on the production system not used (pgversion 7.2.1), so i'm
looking for a low usage time slot to upgrade the system to at least 7.2.3.
Then i can start playing with the indicees. I think, i got an idea about the
usage of indicees.

Thanks for Your help!

Greetings,
-tb
--
Thomas Beutin tb(at)laokoon(dot)IN-Berlin(dot)DE
Beam me up, Scotty. There is no intelligent live down in Redmond.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ericson Smith 2002-12-17 15:19:43 7.3 Prepared statements
Previous Message Jessica Blank 2002-12-17 13:03:15 ERROR: fmgr_info: function 24809: cache lookup failed