Skip site navigation (1) Skip section navigation (2)

Re: GSoC Query

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: gaurav gupta <gauravkumar(dot)gupta(at)students(dot)iiit(dot)ac(dot)in>
Cc: xzilla(at)users(dot)sourceforge(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC Query
Date: 2010-03-29 07:00:21
Message-ID: 4BB05005.3050007@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
gaurav gupta wrote:
> My idea is to add a functionality of Auto tuning and Auto Indexing/ 
> Reindexing in DB languages.

Ah, the classic request to start with the tuning and index wizards.  
Really fun to work on, always lots of interest in it.  Here's the 
thing:  projects in this area attract endless interest.  If it were 
possible to write something useful in a couple of months, we'd have a 
hundred such programs fighting for attention.  So the fact that we 
actually have zero of them should tell you something about the actual 
difficultly level of the work.  You could spend the whole summer just 
reading research papers on this topic and maybe catch up to the late 
90's by the end.

Here's the usual advice I give to students looking to make a useful 
contribution to any mature development project:  the more boring the 
work sounds, the more likely it is you'll actually do something people 
can use.  It's easy to find people who want to work on fun projects--so 
easy that they've all been done already.  What's left is either much 
harder than it looks, or kind of dull to do.  The idea behind 
intentionally picking a boring one is that you're more likely to get one 
that's unfinished for that reason, rather than because it's actually a 
year or two of work to complete.  Or, in the case you're asking about, a 
decade or three if you were to start from scratch and were really 
smart.  If you started working on this now rather than stopping to 
follow the research already done you might catch up to 
http://portal.acm.org/citation.cfm?id=810505 in a couple of months.

> Similarly using the no. of select hits on a table we can check that if 
> maximum no. of times it is on a non-index field we can index on that 
> field to make select faster.

It's impractical to figure out where indexes should go at without 
simulating what the optimizer would then do with them against a sample 
set of queries.  You can't do anything useful just with basic statistics 
about the tables.

I would recommend 
http://msdn.microsoft.com/en-us/library/aa226167(SQL.70).aspx as a good, 
practical introduction to the topic of what it takes to figure out where 
indexes go at, from someone who came up with a reasonable solution to 
that problem.  You can find a list of the underlying research they cite 
(and an idea what has been done since then) at 
http://portal.acm.org/citation.cfm?id=673646

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com   www.2ndQuadrant.us


In response to

  • GSoC Query at 2010-03-29 02:01:45 from gaurav gupta

Responses

  • Re: GSoC Query at 2010-03-29 10:42:08 from Gokulakannan Somasundaram

pgsql-hackers by date

Next:From: Simon RiggsDate: 2010-03-29 08:15:10
Subject: Re: More idle thoughts
Previous:From: Dimitri FontaineDate: 2010-03-29 06:54:19
Subject: Re: join removal

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group