Skip site navigation (1) Skip section navigation (2)

Re: Query about index usage

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Jayadevan M <Jayadevan(dot)Maymala(at)ibsplc(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query about index usage
Date: 2010-06-23 06:05:44
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Jayadevan M wrote:
> It is mentioned that table data blocks have data about tuple visibility and hence table scans 
> are always necessary. So how does PostgreSQL reduce the number of blocks 
> to be read by using indexes?

To be useful, a query utilizing an index must be selective:  it must 
only return a fraction of the possible rows in the table.  Scanning the 
index will produce a list of blocks that contain the potentially visible 
data, then only those data blocks will be retrieved and tested for 

Let's say you have a table that's 100 pages (pages are 8KB) and an index 
that's 50 pages against it.  You run a query that only selects 5% of the 
rows in the table, from a continuous section.  Very rough estimate, it 
will look at 5% * 50 = 3 index pages.  Those will point to a matching 
set of 5% * 100 = 5 data pages.  Now you've just found the right subset 
of the data by only retrieving 8 random pages of data instead of 100.  
With random_page_cost=4.0, that would give this plan a cost of around 
32, while the sequential scan one would cost 100 * 1.0 (sequential 
accesses) for a cost of around 100 (Both of them would also have some 
smaller row processing cost added in there too).

It's actually a bit more complicated than that--the way indexes are 
built means you can't just linearly estimate their usage, and scans of 
non-contiguous sections are harder to model simply--but that should give 
you an idea.  Only when using the index significantly narrows the number 
of data pages expected will it be an improvement over ignoring the index 
and just scanning the whole table.

If the expected use of the index was only 20% selective for another 
query, you'd be getting 20% * 50 = 10 index pages, 20% * 100 = 20 data 
pages, for a potential total of 30 random page lookups.  That could end 
up costing 30 * 4.0 = 120, higher than the sequential scan.    Usually 
the breakpoint for how much of a table has to be scanned before just 
scanning the whole thing sequentially is considered cheaper happens near 
20% of it, and you can shift it around by adjusting random_page_cost.  
Make it lower, and you can end up preferring index scans even for 30 or 
40% of a table.

> Do index data get updated as and when data is committed and made 'visible' or is it 
> that index data get updated as soon as data is changed, before commit is 
> issued and rollback of transaction results in a rollback of the index data

Index changes happen when the data goes into the table, including 
situations where it might not be committed.  The index change doesn't 
ever get deferred to commit time, like you can things like foreign key 
checks.  When a transaction is rolled back, the aborted row eventually 
gets marked as dead by vacuum, at which point any index records pointing 
to it can also be cleaned up.

Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support

In response to


pgsql-performance by date

Next:From: Jayadevan MDate: 2010-06-23 06:15:06
Subject: Re: Query about index usage
Previous:From: Jayadevan MDate: 2010-06-23 04:16:19
Subject: Re: Query about index usage

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group