Re: Postgres, DB2 and Materialized Query Tables (MQTs - DB2s Materialized Views)

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Sandeep Srinivasa <sss(at)clearsenses(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres, DB2 and Materialized Query Tables (MQTs - DB2s Materialized Views)
Date: 2010-10-06 00:56:10
Message-ID: AANLkTim68kC3dkioiUp_jy=6koUBPHM0-C=_2-Kr4ogX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Oct 5, 2010 at 3:28 PM, Sandeep Srinivasa <sss(at)clearsenses(dot)com> wrote:
> Below is an excerpt from a comment over at Reddit. It seemed quite
> interesting to read about the difference in PG and DB2's query optimizer.

The entire post makes it sound like the author believes that Postgres
has Materialized Views built-in to its core. It doesn't -- what you
can have are materialized views that you populate using triggers.
There was a GSoC project to try to add snapshot materialized views
into core a few months ago, though I think it would be fairly
rudimentary compared to the DB2 capabilities your post describes.

[snip]
> if you're in PostgreSQL (or MySQL, or pretty much any DB but Oracle), then
> the query optimizer will build a plan that involves a table scan
> over sometable. for a large table, that sucks. (a clustered index
> over somedate onsometable can make this less painful, but still. bleh.)

Naturally, since the Postgres optimizer has no knowledge about any
materialized views you may have constructed by hand.

The flip side of this is that when you build a materialized view by
hand, you can fine-tune it for performance pretty well: you can
control exactly how it gets refreshed, and use tricks involving your
knowledge of the base tables to take shortcuts in the view refreshing
(e.g. if you know that several types of updates to your base tables
wouldn't get reflected in the view anyway, you can safely ignore
them).

> if you're in DB2, the DB2 query optimizer is smart enough to realize that
> you have that data laying around in your MQT, so it just does a lookup
> for 2010 in the MQT and returns the corresponding COUNT(*) value. if the MQT
> has an index over YEAR(somedate), then this is blazing fast. Point DB2.

I guess this is nice. But again, you could just build the mat. view by
hand on Postgres, and your query will be just as fast, if you've coded
your application to pull from the view instead of the base table.

> and
> because the query optimizer considers MQTs out-of-the-box with no changes to
> existing applications querying the database, if a client's aggregation query
> is running slow and you can define an MQT that backs that query and a
> suitable update schedule for the MQT, the client's query will go 10x's
> faster and require no application changes at all. Another point DB2.

> also, DB2 (and Oracle too, pretty sure) lets you update MQTs incrementally.

You can do this with a hand-built mat. view in Postgres as well.

> in DB2, when you want to add records to an MQT, you can update the MQT's
> contents with just those records as opposed to re-running the query that
> backs it. (for most aggregation functions, anyway.) so, whereas in
> PostgreSQL you'd have to re-run SELECT COUNT(*) FROM sometable WHERE
> YEAR(somedate)=2010 to update your MQT if you add records,

Not true, unless you don't understand the difference between a
snapshot materialized view and the other types.

> in DB2 you can
> just pass those records through the existing MQT and update aggregate values
> where appropriate. (the syntax for that escapes me at the moment,
> unfortunately. any DB2-heads lurking about that can help with that?)
>
> anyway, DB2 is excellent at managing enterprise-level data sets. MQTs are an
> awesome feature it has, and they really change the kinds of applications you
> can write against the database.

Don't get me wrong, I'm not saying it wouldn't be cool to have the
capabilities the author is describing built-in to core, but doing
things by hand really isn't so bad, especially for
performance-critical views. I haven't used DB2, but I would think you
give up some performance and flexibility by using their automated
MQTs. Note that maybe the definitive source for materialized views on
PostgreSQL specifically says: "I do not recommend that PostgreSQL add
Materialized Views in its core.", with good reasons:
<http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views>

Josh

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Min Yin 2010-10-06 02:26:16 How do I write this query? Distinct, Group By, Order By?
Previous Message Scott Ribe 2010-10-06 00:48:18 Re: help modeling a schedule system