Postgres, DB2 and Materialized Query Tables (MQTs - DB2s Materialized Views)

From: Sandeep Srinivasa <sss(at)clearsenses(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Postgres, DB2 and Materialized Query Tables (MQTs - DB2s Materialized Views)
Date: 2010-10-05 19:28:54
Message-ID: AANLkTikuDyOrGE0Ku11FuGJXczS+bMMXX2PRnxVqOTM4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Below is an excerpt from a comment over at Reddit. It seemed quite
interesting to read about the difference in PG and DB2's query optimizer.

Can anybody shed any more light on this ?

-Sandeep

----------------------

there are a couple really important things that DB2 does with Materialized
Query Tables (MQTs, DB2's name for Oracle/everyone else's Materialized
Views) that other DBs -- especially free DBs -- don't do:

1. Consider the contents MQTs in the query optimizer
2. Update MQTs incrementally

for example, imagine you have an MQT backed by this query:

SELECT YEAR(somedate), COUNT(*) FROM sometable GROUP BY YEAR(somedate)

now imagine that, completely independently, you want to run the following
query:

SELECT COUNT(*) FROM sometable WHERE YEAR(somedate)=2010

if you're in PostgreSQL (or MySQL, or pretty much any DB but Oracle), then
the query optimizer will build a plan that involves a table scan over
sometable. for a large table, that sucks. (a clustered index over somedate
onsometable can make this less painful, but still. bleh.)

if you're in DB2, the DB2 query optimizer is smart enough to realize that
you have that data laying around in your MQT, so it just does a lookup for
2010 in the MQT and returns the corresponding COUNT(*) value. if the MQT has
an index over YEAR(somedate), then this is *blazing* fast. Point DB2. and
because the query optimizer considers MQTs out-of-the-box with no changes to
existing applications querying the database, if a client's aggregation query
is running slow and you can define an MQT that backs that query and a
suitable update schedule for the MQT, the client's query will go 10x's
faster and require no application changes at all. Another point DB2.

also, DB2 (and Oracle too, pretty sure) lets you update MQTs incrementally.
in DB2, when you want to add records to an MQT, you can update the MQT's
contents with just those records as opposed to re-running the query that
backs it. (for most aggregation functions, anyway.) so, whereas in
PostgreSQL you'd have to re-run SELECT COUNT(*) FROM sometable WHERE
YEAR(somedate)=2010 to update your MQT if you add records, in DB2 you can
just pass those records through the existing MQT and update aggregate values
where appropriate. (the syntax for that escapes me at the moment,
unfortunately. any DB2-heads lurking about that can help with that?)

anyway, DB2 is *excellent* at managing enterprise-level data sets. MQTs are
an awesome feature it has, and they really change the kinds of applications
you can write against the database.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ozz Nixon 2010-10-05 19:32:30 Re: optimization (can I move pgsql_tmp)?
Previous Message Chris Barnes 2010-10-05 19:12:14 Postgre 9.0 replication using streaming.