Re: again on index usage

From: Daniel Kalchev <daniel(at)digsys(dot)bg>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: again on index usage
Date: 2002-01-10 09:39:31
Message-ID: 200201100939.LAA28676@dcave.digsys.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>Tom Lane said:
> Daniel Kalchev <daniel(at)digsys(dot)bg> writes:
> > In any case, if we need to hit 50k pages (assuming the indexed data is
> > randomly scattered in the file), and having to read these three times each
, it
> > will be less I/O than having to read 1.7 million records.
>
> How do you arrive at that? Assuming 100 records per page (probably the
> right order of magnitude), the seqscan alternative is 17k page reads.
> Yes, you examine more tuples, but CPUs are lots faster than disks.

I tried this:

db=# select * into iplog_test from iplog_gate200112;
SELECT
db=# create index iplog_test_ipaddr_idx on iplog_test(ipaddr);
CREATE
db=# cluster iplog_test_ipaddr_idx on iplog_test;
CLUSTER
db=# create index iplog_test_ipdate_idx on iplog_test(ipdate);
CREATE
db=# vacuum verbose analyze iplog_test;
NOTICE: --Relation iplog_test--
NOTICE: Pages 17722: Changed 0, reaped 0, Empty 0, New 0; Tup 1706202: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 80, MaxLen 88; Re-using: Free/Avail.
Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 1.48s/-1.86u sec.
NOTICE: Index iplog_test_ipaddr_idx: Pages 5621; Tuples 1706202. CPU
0.51s/1.80u sec.
NOTICE: Index iplog_test_ipdate_idx: Pages 4681; Tuples 1706202. CPU
0.36s/1.92u sec.
NOTICE: --Relation pg_toast_253297758--
NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL
0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0;
EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_253297758_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
NOTICE: Analyzing...
VACUUM
db=# explain
db-# SELECT sum(input), sum(output) FROM iplog_test
db-# WHERE
db-# '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02 00:00:00+02'
AND
db-# '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02'
AND
db-# ipaddr <<= '193.68.240.0/20' AND 'uni-gw' ~ router;
NOTICE: QUERY PLAN:

Aggregate (cost=56112.97..56112.97 rows=1 width=16)
-> Seq Scan on iplog_test (cost=0.00..56111.54 rows=284 width=16)

EXPLAIN

query runs for ca 3.5 minutes.

db=# set enable_seqscan to off;

the query plan is

Aggregate (cost=100507.36..100507.36 rows=1 width=16)
-> Index Scan using iplog_test_ipdate_idx on iplog_test
(cost=0.00..100505.94 rows=284 width=16)

query runs for ca 2.2 minutes.

Moves closer to your point :-)

Anyway, the platform is an dual Pentium III 550 MHz (intel) machine with 512
MB RAM, with 15000 RPM Cheetah for the database, running BSD/OS 4.2. The
machine is reasonably loaded all the time, so this is very much real-time test.

I agree, that with the 'wrong' clustering the index scan is not so much faster
than the sequential scan.

Perhaps I need to tune this machine's costs to prefer more disk intensive
operations over CPU intensive operations?

Let's see what 4.2 will result in.

Daniel

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2002-01-10 10:37:23 Re: ECPG: include sqlca
Previous Message Justin Clift 2002-01-10 09:16:25 Re: Increasing checkpoint distance helps 7.2 noticeably