matview incremental maintenance

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: matview incremental maintenance
Date: 2013-06-17 14:41:15
Message-ID: 1371480075.55528.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Since there seems to be interest in discussing incremental
maintenance of materialized views *now*, I'm starting this thread
to try to avoid polluting unrelated threads with the discussion.  I
don't intend to spend a lot of time on it until the CF in progress
completes, but at that point the work will start in earnest.  So
I'll say where I'm at, and welcome anyone who has time to spare
outside of the CF to comment or contribute ideas.

The paper at the core of the discussion can be found by searching
for "maintaining views incrementally gupta mumick subrahmanian" --
it's on both the ACM and CiteSeerX websites.  Of course, one
doesn't need to understand that paper to discuss techniques for
capturing the base deltas, but I'm hoping that's not what takes up
most of the discussion.  I expect the most important discussions to
be around how best to handle the "count(t)" (or count_t) column,
what form should be use for intermediate results, how to modify or
add execution nodes which know how to deal with the count, how to
generate set operations to use those nodes, and how to modify the
planner to choose the best plan for these operations.  Whether to
pull the deltas off the WAL stream or stuff them into a tuplestore
as they are written seems to me to be a relatively minor point.  If
properly abstracted, the performance and complexity of alternatives
can be compared.

The one thing that seems somewhat clear to me at the moment is that
the complex set algebra needed to use the counting algorithm for
incremental maintenance is not going to be something I want to
handle by dynamically building up execution nodes.  That way lies
madness. SPI or something very similar to it should be used,
probably with a layer or two above it to simplify working with the
algebra separately from diddling around with strings for the query
fragments.

At the developer meeting last month, we talked about the special
new count column for a bit, and everyone seemed to agree that
adding such an animal, and creating execution nodes which were
aware of it, would best be done on top of the patch Álvaro has been
working on to replace attnum with three columns: a logical ID
number for each column, the physical order of the attribute within
the tuple image, and the display order (for SELECT *, INSERT
without a column list, and similar cases).  We seemed to have
consensus that the count_t column would not display by default, but
could be explicitly called out by a query, similar to the current
handling of system columns.  Nobody wanted to have a negative
column number for the count or add it to the tuple header
structure.  Unfortunately I have heard from Álvaro that the patch
is not complete and is not on his list of things to work on in the
near future.

Long term, timings for incremental maintenance that people would
like to see (from most eager to least eager) are:

 - as part of completing each statement, so that the affect on the
matview is immediately visible to the transaction which modifies a
supporting table, and becomes visible at commit to other
transactions

 - at transaction commit time, so that other transactions see the
changes to the base tables and the referencing matviews at the same
point in time

 - from a FIFO queue which is processed by a background process
whenever data is present (possibly with pacing)

 - from a FIFO queue based on a schedule, so that matviews are
stable between applications and/or to avoid burdening the machine
during peak periods

 - incremental update, or even full refresh, on an attempt to query
a "stale" matview

 - explicit request to apply incremental updates or refresh

Incremental maintenance of a materialized view is a heuristic, to
refresh contents more quickly than might happen by re-running the
query which defines the matview.  There will always be cases where
the changes are so extensive that applying the delta will be slower
than a refresh.  At some point we should have a cost-based way to
recognize when we have crossed that threshold, and fall back to the
refresh technique.  That's not for this release, though.

In previous discussion there seemed to be a consensus that before
incremental maintenance for a materialized view could be turned on,
the matview would need to be populated and all referenced tables
would need to be flagged as generating delta information, through a
new ALTER TABLE option.

While I have yet to look in detail at the mechanism for capturing
the initial delta on the base tables, the two fairly obvious
candidates are to stuff the before and after images into a
tuplestore or temp table as base table changes are written,
somewhere around the point that triggers would be fired, or to use
the WAL stream in some way.  The advantages of the former are that
it would be hard to find a lower overhead way to capture the data,
nor a more certain way to get exactly the right data.  The latter,
which Simon has been arguing is better than using triggers, would
have the advantage of not directly slowing down a process writing
to base tables, although for more eager modes transactions would
need to block waiting for the data to flow through the walsender,
be filtered and assembled as data of interest, and communicated
back to the transaction somehow before it could proceed.  Assuming
that it can provide the changeset prior to the commit, and that it
can include "before" images, it could work, but the timing sure
seems dubious for the more eager modes.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-06-17 15:03:35 Re: Support for REINDEX CONCURRENTLY
Previous Message Andres Freund 2013-06-17 14:16:22 PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn