Quick Links

Re: again on index usage

From:	Daniel Kalchev <daniel(at)digsys(dot)bg>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: again on index usage
Date:	2002-01-09 20:06:43
Message-ID:	200201092006.WAA22792@dcave.digsys.bg
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>>>Tom Lane said:
> Daniel Kalchev <daniel(at)digsys(dot)bg> writes:
> > I understand the clustering might help with sequential scans, but why
> > would it help with index scans?
>
> No, the other way around: it makes no difference for seq scans, but can
> speed up index scans quite a lot. With a clustered table, successive
> index-driven fetches tend to hit the same pages rather than hitting
> random pages throughout the table. That saves I/O.

Ok, time to go home :-), but...

> Given the numbers you were quoting, if the table were in perfectly
> random order by ipdate then there would probably have been about three
> rows per page that the indexscan would've had to fetch. This would mean
> touching each page three times in some random order. Unless the table
> is small enough to fit in Postgres' shared buffer cache, that's going to
> represent a lot of extra I/O --- a lot more than reading each page only
> once, as a seqscan would do. At the other extreme, if the table is
> perfectly ordered by ipdate then the indexscan need only hit a small
> number of pages (all the rows we want are in a narrow range) and we
> touch each page many times before moving on to the next. Very few I/O
> requests in that case.

In any case, if we need to hit 50k pages (assuming the indexed data is
randomly scattered in the file), and having to read these three times each, it
will be less I/O than having to read 1.7 million records. The table will never
be laid sequentially on the disk, at least not in this case (which adds to the
table day after day - and this is why data is almost ordered by ipdate).

What I am arguing about is the scaling - is 50k random reads worse than 1.7
million sequential reads? Eventually considering the tuple size, disk block
size etc.

I will wait patiently for 4.2 to release and see how this same table performs.
:-)

Daniel

Responses

Re: again on index usage at 2002-01-09 20:20:59 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2002-01-09 20:07:18	Re: RC1 time?
Previous Message	Doug McNaught	2002-01-09 20:03:54	Re: Some architectures need "signed char" declarations