Re: GSoC - proposal - Materialized Views in PostgreSQL

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: pavelbaros <baros(dot)p(at)seznam(dot)cz>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC - proposal - Materialized Views in PostgreSQL
Date: 2010-04-10 07:07:35
Message-ID: 4BC023B7.9040402@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith wrote:
> The main hidden complexity in this particular project relates to
> handling view refreshes. The non-obvious problem is that when the view
> updates, you need something like a SQL MERGE to really handle that in a
> robust way that doesn't conflict with concurrent access to queries
> against the materialized view. And work on MERGE support is itself
> blocked behind the fact that PostgreSQL doesn't have a good way to lock
> access to a key value that doesn't exist yet--what other databases call
> key range locking. See the notes for "Add SQL-standard
> MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo
> for more information.
>
> You can work around that to build a prototype by grabbing a full table
> lock on the materialized view when updating it, but that's not a
> production quality solution.

It would still be useful for many applications. And it would provide a
basis to extend later. You don't need to solve all problems at once, as
long as what you implement is a useful subset.

> Now, with all that said, that doesn't mean there's not a useful project
> for you buried in this mess. The first two steps in your plan:
>
> 1) create materialized view
> 2) change rewriter
>
> Include building a prototype grammer, doing an initial executor
> implementation, and getting some sort of rewriter working. That is
> potentially good groundwork to lay here. I would suggest that you
> completely drop your step 3:
>
> 3) create command that takes snapshot (refresh MV)
>
> Because you cannot built that in a way that will be useful (and by that
> I mean committable quality) until there's a better way to handle updates
> than writing a whole new table and grabbing a full relation lock to
> switch to it. To do a good job just on the first two steps should take
> at least a whole summer anyway--there's a whole stack of background
> research needed I haven't seen anyone do yet, and that isn't on your
> plan yet. There is a precedent for taking this approach. After getting
> stalled trying to add the entirety of easy partitioning to PostgreSQL,
> the current scope has been scaled back to just trying to get the syntax
> and on-disk structure right, then finish off the implementation. See
> http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how
> that's been broken into those two major chunks.

The good thing about this subject for GSoC is that it can be divided
into many small steps. There's two largely independent main parts:

1. Keep the materialized view up-to-date when the base tables change.
This can be further divided into many steps, you can begin by supporting
automatic updates only on very simple views with e.g a single table and
a where clause. Then extend that to support joins, aggregates,
subqueries etc. Keeping it really limited, you could even require the
user to write the required triggers himself.

2. Teach the planner to use materialized views automatically when a
query references the base tables. So if you issue the query "SELECT *
FROM table WHERE foo > 10 AND bar = 10", and there's a materialized view
on "SELECT * FROM table WHERE bar = 10", the planner can transform the
original query into "SELECT * FROM materializedview WHERE foo > 10".
This largely depends on 1, although some DBMSs offer the option to use
manually refreshed materialized views too, knowing that they might not
be completely up-to-date.

There's a lot room to choose which problems you want to tackle, which is
good for a summer-of-code project. Your proposal basically describes
doing 1, in a limited fashion where the view is not updated
automatically, but only when the DBA runs a command to refresh it. I'm
not sure if that's useful enough on its own, writing "CREATE
MATERIALIZED VIEW ... SELECT ..." doesn't seem any easier than just
writing "CREATE TABLA AS ...". But if you can do something about 2, or
even a very limited part of 1, keeping the view up-to-date
automatically, it becomes much more useful.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2010-04-10 07:20:16 Re: Set LC_COLLATE to de_DE_phoneb
Previous Message Yeb Havinga 2010-04-10 06:26:49 Re: extended operator classes vs. type interfaces