pgsql is 75 times faster with my new index scan

From: devik(at)cdi(dot)cz
To: pgsql-hackers(at)postgresql(dot)org
Subject: pgsql is 75 times faster with my new index scan
Date: 2000-09-26 09:15:28
Message-ID: 39D06930.F2F88139@cdi.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,
I recently spoke about extending index scan to be able
to take data directly from index pages. I wanted to know
whether should I spend my time and implement it.
So that I hacked last pgsql a bit to use proposed scan
mode and did some measurements (see bellow). Measurements
was done on (id int,txt varchar(20)) table with 1 000 000 rows
with btree index on both attrs. Query involved was:
select id,count(txt) from big group by id;
Duplicates distribution on id column was 1:1000. I was run
query twice after linux restart to ensure proper cache
utilization (on disk heap & index was 90MB in total).
So I think that by implementing this scan mode we can expect
to gain huge speedup in all queries which uses indices and
can found all data in their pages.

Problems:
my changes implemented only indexscan and new cost function.
it doesn't work when index pages contains tuples which doesn't
belong to our transaction. test was done after vacuum and
only one tx running.

TODO:
- add HeapTupleHeaderData into each IndexTupleData
- change code to reflect above
- when deleting-updating heap then also update tuples'
HeapTupleHeaderData in indices

The last step could be done in two ways. First by limiting
number of indices for one table we can store coresponding
indices' TIDs in each heap tuple. The update is then simple
taking one disk write.
Or do it in standart way - lookup appropriate index tuple
by traversing index. It will cost us more disk accesses.

Is someone interested in this ??
regards devik

With current indexscan:
! system usage stats:
! 1812.534505 elapsed 93.060547 user 149.447266 system sec
! [93.118164 user 149.474609 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 130978/32 [131603/297] page faults/reclaims, 132 [132] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [0/0] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 555587 read, 551155 written, buffer hit
rate = 44.68%
! Local blocks: 0 read, 0 written, buffer hit
rate = 0.00%
! Direct blocks: 0 read, 0 written

With improved indexscan:
! system usage stats:
! 23.686788 elapsed 22.157227 user 0.372071 system sec
! [22.193359 user 0.385742 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 1186/42 [1467/266] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [0/0] voluntary/involuntary context switches
! postgres usage stats:
! Shared blocks: 4385 read, 0 written, buffer hit
rate = 4.32%
! Local blocks: 0 read, 0 written, buffer hit
rate = 0.00%
! Direct blocks: 0 read, 0 written

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2000-09-26 09:28:07 Re: pgsql is 75 times faster with my new index scan
Previous Message Jarmo Paavilainen 2000-09-26 07:39:59 Case sensitive field names