Quick Links

Re: Query about index usage

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Jayadevan M <Jayadevan(dot)Maymala(at)ibsplc(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query about index usage
Date:	2010-06-23 06:05:44
Message-ID:	4C21A438.9030903@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Jayadevan M wrote:
> It is mentioned that table data blocks have data about tuple visibility and hence table scans
> are always necessary. So how does PostgreSQL reduce the number of blocks
> to be read by using indexes?

To be useful, a query utilizing an index must be selective: it must
only return a fraction of the possible rows in the table. Scanning the
index will produce a list of blocks that contain the potentially visible
data, then only those data blocks will be retrieved and tested for
visibility.

Let's say you have a table that's 100 pages (pages are 8KB) and an index
that's 50 pages against it. You run a query that only selects 5% of the
rows in the table, from a continuous section. Very rough estimate, it
will look at 5% * 50 = 3 index pages. Those will point to a matching
set of 5% * 100 = 5 data pages. Now you've just found the right subset
of the data by only retrieving 8 random pages of data instead of 100.
With random_page_cost=4.0, that would give this plan a cost of around
32, while the sequential scan one would cost 100 * 1.0 (sequential
accesses) for a cost of around 100 (Both of them would also have some
smaller row processing cost added in there too).

It's actually a bit more complicated than that--the way indexes are
built means you can't just linearly estimate their usage, and scans of
non-contiguous sections are harder to model simply--but that should give
you an idea. Only when using the index significantly narrows the number
of data pages expected will it be an improvement over ignoring the index
and just scanning the whole table.

If the expected use of the index was only 20% selective for another
query, you'd be getting 20% * 50 = 10 index pages, 20% * 100 = 20 data
pages, for a potential total of 30 random page lookups. That could end
up costing 30 * 4.0 = 120, higher than the sequential scan. Usually
the breakpoint for how much of a table has to be scanned before just
scanning the whole thing sequentially is considered cheaper happens near
20% of it, and you can shift it around by adjusting random_page_cost.
Make it lower, and you can end up preferring index scans even for 30 or
40% of a table.

> Do index data get updated as and when data is committed and made 'visible' or is it
> that index data get updated as soon as data is changed, before commit is
> issued and rollback of transaction results in a rollback of the index data

Index changes happen when the data goes into the table, including
situations where it might not be committed. The index change doesn't
ever get deferred to commit time, like you can things like foreign key
checks. When a transaction is rolled back, the aborted row eventually
gets marked as dead by vacuum, at which point any index records pointing
to it can also be cleaned up.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Re: Query about index usage at 2010-06-23 04:16:19 from Jayadevan M

Responses

Re: Query about index usage at 2010-06-23 06:15:06 from Jayadevan M

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jayadevan M	2010-06-23 06:15:06	Re: Query about index usage
Previous Message	Jayadevan M	2010-06-23 04:16:19	Re: Query about index usage