Skip site navigation (1) Skip section navigation (2)

Re: GSoC - proposal - Materialized Views in PostgreSQL

From: Pavel <baros(dot)p(at)seznam(dot)cz>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-20 17:59:52
Message-ID: 4BCDEB98.8010608@seznam.cz (view raw or flat)
Thread:
Lists: pgsql-hackers
Greg Smith wrote:

> pavelbaros wrote:
>> I am also waiting for approval for my repository named 
>> "materialized_view" on git.postgresql.org, so I could publish 
>> completed parts. 
>
> Presuming that you're going to wander there and get assigned what 
> looks like an official repo name for this project is a 
> bit...optimistic.  I would recommend that you publish to something 
> like github instead (you can fork http://github.com/postgres/postgres 
> ), and if the work looks good enough that it gets picked up by the 
> community maybe you migrate it onto the main site eventually.  
> git.postgresql.org is really not setup to be general hosting space for 
> everyone who has a PostgreSQL related project; almost every repo on 
> there belongs to someone who has already been a steady project 
> contributor for a number of years.

Yes, you're true, I'm kind of newbe in this kind of project and 
specially in PostgreSQL. But I think it is best way to get into 
PostgreSQL. When I chose my bachelor thesis I did not know I could 
participate GSoC or try to make it commitable. Anyway I will make repo 
on github, so everybody could look at it, as soon as posible.

<http://github.com/pbaros/postgres>
>  
> (Switching to boilerplate mode for a paragraph...) You have picked a 
> PostgreSQL feature that is dramatically more difficult than it appears 
> to be, and I wouldn't expect you'll actually finish even a fraction of 
> your goals in a summer of work.  You're at least in plentiful 
> company--most students do the same.  As a rule, if you see a feature 
> on our TODO list that looks really useful and fun to work on, it's 
> only still there because people have tried multiple times to build it 
> completely but not managed to do so because it's harder than it 
> appears.  This is certainly the case with materialized views.
>
> You've outlined a reasonable way to build a prototype that does a 
> limited implementation here.  The issue is what it will take to extend 
> that into being production quality for the real-world uses of 
> materialized views.  How useful your prototype is depends on how well 
> it implements a subset of that in a way that will get used by the 
> final design.
>
> The main hidden complexity in this particular project relates to 
> handling view refreshes.  The non-obvious problem is that when the 
> view updates, you need something like a SQL MERGE to really handle 
> that in a robust way that doesn't conflict with concurrent access to 
> queries against the materialized view.  And work on MERGE support is 
> itself blocked behind the fact that PostgreSQL doesn't have a good way 
> to lock access to a key value that doesn't exist yet--what other 
> databases call key range locking.  See the notes for "Add SQL-standard 
> MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo 
> for more information.
>
> You can work around that to build a prototype by grabbing a full table 
> lock on the materialized view when updating it, but that's not a 
> production quality solution.  Solving that little detail is actually 
> more work than the entire project you've outlined.  Your suggested 
> implementation--"In function CloseIntoRel executor swap relfilenode's 
> of temp table and original table and finally delete temp table"--is 
> where the full table lock is going to end up at.  The exact use cases 
> that need materialized views cannot handle a CLUSTER-style table 
> recreation each time that needs an exclusive lock to switchover, so 
> that whole part of your design is going to be a prototype that doesn't 
> work at all like what needs to get built to make this feature 
> committable.  It's also not a reasonable assumption that you have 
> enough disk space to hold a second copy of the MV in a production system.

For now I know it is not commitable in actual state, but for my thesis 
it is enough and I know it will not be commitable with this design at 
all. In case of GSoC it will depends on the time I will be able to spend 
on it, if I will consider some other design.

>
> Once there's a good way to merge updates, how to efficiently generate 
> them against the sort of large data sets that need materalized 
> views--so you just write out the updates rather than a whole new 
> copy--is itself a large project with a significant quantity of 
> academic research to absorb before starting.  Dan Colish at Portland 
> State has been playing around with prototypes for the specific problem 
> of finding a good algorithm for view refreshing that is compatible 
> with PostgreSQL's execution model.  He's already recognized the table 
> lock issue here and for the moment is ignoring that part.  I don't 
> have a good feel yet for how long the targeted update code will take 
> to mature, but based on what I do know I suspect that little detail is 
> also a larger effort than the entire scope you're envisioning.  
> There's a reason why the MIT Press compendium "Materialized Views: 
> Techniques, Implementations, and Applications" is over 600 pages 
> long--I hope you've already started digging through that material.

I would like to start to dig through that, but I'm in a hurry now. I 
already have made a small research on MV as part of my thesis. I also 
plan to continue study PostgreSQL and Materialized Views more into the 
depth, preferably as my master thesis. But I realize MV feature 
commitable to PostgreSQL is not project for one person, of course.

>
> Now, with all that said, that doesn't mean there's not a useful 
> project for you buried in this mess.  The first two steps in your plan:
>
> 1) create materialized view
> 2) change rewriter
>
> Include building a prototype grammer, doing an initial executor 
> implementation, and getting some sort of rewriter working.  That is 
> potentially good groundwork to lay here.  I would suggest that you 
> completely drop your step 3:
>
> 3) create command that takes snapshot (refresh MV)
>
> Because you cannot built that in a way that will be useful (and by 
> that I mean committable quality) until there's a better way to handle 
> updates than writing a whole new table and grabbing a full relation 
> lock to switch to it.  To do a good job just on the first two steps 
> should take at least a whole summer anyway--there's a whole stack of 
> background research needed I haven't seen anyone do yet, and that 
> isn't on your plan yet.  There is a precedent for taking this 
> approach.  After getting stalled trying to add the entirety of easy 
> partitioning to PostgreSQL, the current scope has been scaled back to 
> just trying to get the syntax and on-disk structure right, then finish 
> off the implementation.  See 
> http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how 
> that's been broken into those two major chunks.
>

Anyway thanks for all of your advices and help.

best regards,
        Pavel Baros




In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2010-04-20 18:24:35
Subject: Re: Thoughts on pg_hba.conf rejection
Previous:From: Marc G. FournierDate: 2010-04-20 17:53:49
Subject: Re: BETA

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group