Re: INDEX suggestion needed

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Thomas Beutin <tyrone(at)laokoon(dot)IN-Berlin(dot)DE>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: INDEX suggestion needed
Date: 2002-12-12 21:00:48
Message-ID: pgrhvus3d1nr8pp8311lopqd5uk4ffuir5@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 12 Dec 2002 20:13:24 +0100, Thomas Beutin
<tyrone(at)laokoon(dot)IN-Berlin(dot)DE> wrote:
>tb=# VACUUM VERBOSE ANALYZE stat_pages;
>NOTICE: Pages 7809: Changed 0, Empty 0; Tup 343554: Vac 0, Keep 0, UnUsed 0.

>Aggregate (cost=3.19..3.19 rows=1 width=34) (actual time=52.89..52.89 rows=1 loops=1)
> -> Index Scan using tb5 on stat_pages (cost=0.00..3.18 rows=1 width=34) (actual time=52.74..52.74 rows=0 loops=1)
>Total runtime: 53.11 msec

>tb=# EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-06-01' AND visit <= '2002-12-11');

This selects (almost) all rows. An index cannot help.

>i got the following index/sequence scans by date ranges:
>(visit >= '2002-06-01' AND visit <= '2002-06-30') index scan

0 rows satisfy this condition, index scan is ok

>(visit >= '2002-06-01' AND visit <= '2002-07-31') index scan

Ca. 10000 rows, 3% of the whole table, index scan ok

>(visit >= '2002-06-01' AND visit <= '2002-08-31') sequence scan

32%, sequence scan is expected to be faster, unless tuples are almost
perfectly ordered by visit or most of the table (~ 8000 pages) fits
into the cache. How much physical memory is installed? What are your
shared_buffers and effective_cache_size settings? You might want to
experiment with
SET enable_seqscan = off;

>(visit >= '2002-07-01' AND visit <= '2002-07-31') index scan

Same as 06-01 to 07-31.

>(visit >= '2002-08-01' AND visit <= '2002-08-31') sequence scan

Ca. 29%

>(visit >= '2002-09-01' AND visit <= '2002-09-30') sequence scan

More than 50%

>(visit >= '2002-10-01' AND visit <= '2002-10-31') index scan, but long (>5sec)

Ca. 3%; interesting that it takes so long; I'd like to see EXPLAIN
ANALYZE output for enable_seqscan on and off.

>And: The date range in my table is from 2002-07-10 11:36:53+02 up to
>2002-10-29 23:31:47+01.

Yes, this is approximately reflected by the histogram bounds.

> attname | null_frac | avg_wi | n_distinct | correlation
>---------+-----------+--------+------------+-------------
> visit | 0 | 8 | -0.543682 | -0.972118

The negative correlation looks strange. How did you insert your data?

> m_id | 0 | 2 | 1 | 1
^^^
Only one distinct value in m_id? This explains why your m_id index is
never used.

> attname | most_common_vals
>---------+--------------------------
> visit | "2002-08-21 10:29:10+02", ...
> m_id | 35

> attname | histogram_bounds
>---------+--------------------------
> visit | "2002-07-25 16:37:12+02"
> "2002-08-15 12:36:18+02"
> "2002-08-23 12:36:15+02"
> "2002-08-29 17:30:54+02"
> "2002-09-05 12:54:31+02"
> "2002-09-10 18:03:54+02"
> "2002-09-16 15:44:56+02"
> "2002-09-20 14:34:40+02"
> "2002-09-24 13:59:29+02"
> "2002-09-29 09:09:31+02"
> "2002-10-29 23:25:13+01"
> m_id |
>

>??? Is this output ok?

Almost. I forgot to ask for most_common_freqs (cut'n'paste error).
But I don't expect any value of visit to occur much more than twice,
so these values should be irrelevant to our estimations.

Servus
Manfred

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lincoln Yeoh 2002-12-12 21:02:28 Re: Docs: GIST
Previous Message Doug Fields 2002-12-12 20:38:32 Moving Indices to a different disk and various performance questions