Re: Query Rewrite for Materialized Views (Postgres Extension)

From: Nico Williams <nico(at)cryptonector(dot)com>
To: Dent John <denty(at)QQdd(dot)eu>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Query Rewrite for Materialized Views (Postgres Extension)
Date: 2018-06-18 19:13:55
Message-ID: 20180618191355.GC4200@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 18, 2018 at 07:38:13PM +0100, Dent John wrote:
> I commented to Corey (privately) that, while my rewrite extension has
> gotten me a server that responds quickly to aggregate queries, the
> constant need to refresh the supporting MVs means the system’s load
> average is constant and much higher than before. I’m happy with the
> tradeoff for now, but it’s a huge waste of energy, and I’m sure it
> must thrash my disk.
>
> I’m very interested in what other people think of Corey’s idea.

I've written an alternative materialization extension (entirely as
PlPgSQL) based on PG's internals, but my version has a few big wins that
might help here. I'm thinking of properly integrating it with PG. Some
of the features include:

- you can write triggers that update the materialization

This is because the materialization is just a regular table in my
implementation.

- you can mark a view as needing a refresh (e.g., in a trigger)

- you can declare a PK, other constraints, and indexes on a
materialization

The DMLs used to refresh a view concurrently can take advantage of
the PK and/or other indexes to go fast.

- you get a history table which records updates to the materialization

This is useful for generating incremental updates to external
systems.

Keeping track of refresh times should help decide whether to use or not
use a materialization in some query, or whether to refresh it first, or
not use it at all.

One of the things I'd eventually like to do is analyze the view query
AST to automatically generate triggers to update materializations or
mark them as needing refreshes. A first, very very rough sketch of such
an analysis looks like this:

- if the view query has CTEs
-> create triggers on all its table sources to mark the
materialization as needing a refresh

- else if a table appears more than once as a table source in the view
query
-> create triggers on that table that mark the materialization as
needing a refresh

- else if a table appears anywhere other than the top-level
-> create triggers .. mark as needing refresh

- else if a table is a right-side of a left join
-> create triggers .. mark as needing refresh

- else if a table has no PK
-> create triggers .. mark as needing refresh

- else if the query has no GROUP BY, or only does a GROUP BY on this
table and a list of columns prefixed by the table's PK
-> rewrite the query to have WHERE eq conditions on values for the
table's PK columns

analyze this query

if the result shows this table source as the first table in the
plan
-> create triggers on this table to update the materialization
directly from querying the source view

- else
-> create triggers .. mark as needing refresh

Nico
--

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message AJG 2018-06-18 19:38:00 Re: Column store in Greenplum
Previous Message Jim Finnerty 2018-06-18 18:51:12 Re: Query Rewrite for Materialized Views (FDW Extension)