Re: Materialized views WIP patch

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2012-11-16 21:20:02
Message-ID: 50A6AE02.7000102@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kevin,

> I agree that there are likely to be more use cases for this than
> temp MVs. Unfortunately, I've had a hard time figuring out how to
> flag an MV which is empty because its contents were lost after a
> crash with preventing people from using an MV which hasn't been
> populated, which has the potential to silently return incorrect
> results.

See below.

>>> 2. MVs don't support inheritance.
>>
>> In which direction? Can't inherit, or can't be inherited from?
>
> The table inheritance has not been implemented in either direction
> for MVs. It didn't seem clear to me that there were reasonable use
> cases. Do you see any?

No, I just wanted clarity on this. I can see a strong case for
eventually supporting CREATE MATERIALIZED VIEW matview_1 LIKE matview,
in order to "copy" mativews, though.

> Consistency. There are other object types which seem to enforce this
> rule for no reason that I can see beyond maybe a desire to have both
> directions of COPY work with the same set of objects. If I remember
> correctly, allowing this would eliminate one line of code from the
> patch, so if sentiment is in favor of it, it is very easily done.

There's going to be a pretty strong demand for COPY FROM matviews.
Forcing the user to use COPY FROM ( SELECT ... ) will be seen as
arbitrary and unintuitive.

>> How would you change the definition of an MV then?
>
> At this point you would need to drop and re-create the MV. If we
> want to add columns to an MV or change what an existing column holds,
> perhaps we could have an ALTER MV which changed the SELECT statement
> that populates the MV? I would prefer to leave that to a later patch,
> though -- it seems like a bit of a minefield compared to what is
> being implemented in this patch.

I agree that it should be a later patch.

> Not technically, really, but I saw two reasons that I preferred LOAD MV:
>
> 1. It seems to me to better convey that the entire contents of the MV
> will be built from scratch, rather than incrementally adjusted.
> 2. We haven't hashed out the syntax for more aggressive maintenance of
> an MV, and it seemed like UPDATE MV might be syntax we would want to
> use for something which updated selected parts of an MV when we do.

Hmmm, I see your point. So "LOAD" would recreate, and (when supported)
UPDATE would incrementally update?

> It builds a new heap and moves it into place. When the transaction
> running LMV commits, the old heap is deleted. In implementation it is
> closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
> creating a new table. This allows all permissions, etc., to stay in
> place.

OK, so same effect as a truncate.

> Robert feels the same way, but I disagree. Some MVs will not be terribly
> volatile. In my view there is a big difference between having a "top ten"
> list which might be based on yesterday's base tables rather than the base
> table states as of this moment, and having a "top ten" list with no
> entries. If you want to, for example, take some action if an order comes
> in for one of your top customers, and a different path for other
> customers, suddenly treating all of your long-time top customers as not
> being so, without any squawk from the database, seems dangerous.

Right, but a relisvalid flag just tells me that the matview was updated
at sometime in the past, and not *when* it was updated. It could have
been 3 years ago. The fact that it was updated at some indefinite time
is fairly valueless information.

There's a rule in data warehousing which says that it's better to have
no data (and know that you have no data) than to have incorrect data.

> I see your point there; I'll think about that. My take was more that MVs
> would often be refreshed by crontab, and that you would want to keep
> subsequent steps from running and generating potentially plausible but
> completely inaccurate results if the LMV failed.

Yeah, that too. Also, a timestamp it would make it easy to double-check
if the cron job was failing or had been disabled.

> UNLOGGED tables and indexes are made empty on crash recovery by copying
> the initialization fork over the "normal" relations. Care was taken to
> avoid needing to connect to each database in turn to complete that
> recovery. This style of recovery can't really set the relisvalid flag, as
> far as I can see; which leaves us choosing between unlogged MVs and
> knowing whether they hold valid data -- unless someone has a better idea.

Yeah, well, whether we have relisvalid or mvlastupdate, we're going to
have to work out some way to have that field react to changes to the
table overall. I don't know *how*, but it's something we'll have to solve.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2012-11-16 22:05:25 Re: another idea for changing global configuration settings from SQL
Previous Message Antonin Houska 2012-11-16 21:02:05 Re: Materialized views WIP patch