Re: Materialized views proposal

From: Sailesh Krishnamurthy <sailesh(at)cs(dot)berkeley(dot)edu>
To: Hannu Krosing <hannu(at)tm(dot)ee>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Jonathan Gardner <jgardner(at)jonathangardner(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Materialized views proposal
Date: 2003-11-30 22:46:46
Message-ID: bxyisl11nl5.fsf@datafix.cs.berkeley.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Hannu" == Hannu Krosing <hannu(at)tm(dot)ee> writes:

Hannu> Neil Conway kirjutas P, 30.11.2003 kell 02:18:
>> Jonathan Gardner <jgardner(at)jonathangardner(dot)net> writes: > 3) We
>> would implement some sort of differential view update scheme >
>> based on the paper "Efficiently Updating Materialized
>> Views"[1].

Hannu> Maybe the TelegraphCQ engine can give some ideas

Hannu> http://telegraph.cs.berkeley.edu/

Hannu> from brief reading, it seems that all they do is kind of
Hannu> materialized views ;) - they call it Continuous Dataflow
Hannu> Processing, i.e. queries that run continuously over
Hannu> incoming data.

A fair portion of the community argues that everything that we do is
merely materialized views :-)

While materialized views are certainly related, there are however,
things that are quite different in processing continuous queries over
data streams. The first ideas on stream processing did come from work
on scalable triggers which is intimately related to materialized
views. Work on mviews isn't really concerned with things like windowed
joins and aggregate processing.

My advice is to take a very incremental approach to materialized view
implementation. There are essentially 2 pieces - one the maintenance
of the views and second the routing of queries to automagically use
the materialized views. The former is done by something like triggers
and the latter is through either a query-rewrite mechanism or
something like an optimizer choosing a different access method.

As for the former, there are again two subdivisions - what changes you
propagate and when you apply 'em. What you propagate are the
deltas. So you have the following choices:

IPIA - Immediate Propagate, Immediate Apply
IPDA - Immediate Propagate, Deferred Apply
DPDA - Deferred Propagate, Deferred Apply

DPIA - makes no sense ..

The easiest is DPDA .. in this model you essentially recompute the
view on demand. The next is IPIA.. with IPIA, you can choose to either
recompute the entire view or only figure out how to translate a delta
into appropriate changes in the mview. After that there is IPDA
.. this involves creating a delta table that is populated based on the
triggers. Every so often you process a set of deltas and recompute the
view (either in its entirety or in parts>

Another way to slice the problem is to limit the scope of queries that
can be used to define views.

I suggest that first we don't consider joins .. and only consider
grouped aggregates over a single table. Joins are more hairy with
changes in one table potentially requiring a wholesale recomputation
of the join.

Note that all this discussion is only about the first part of the
problem .. maintenance of the views. The other part, routing is almost
equally complicated .. there you have to solve the query subsumption
problem without also eliminating choices in your access plan.

As people have said there are plenty of papers on this in the
literature. While I am no theorist I can certainly help with reading
the papers .. not every bit of a paper is very useful.

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2003-11-30 23:18:54 Re: [HACKERS] initdb should create a warning message [was Re:
Previous Message Alex Satrapa 2003-11-30 22:38:06 Re: Was: Triggers, Stored Procedures, PHP