Re: Materialized views WIP patch

From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views WIP patch
Date: 2012-11-16 16:25:57
Message-ID: 20121116162558.90150@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Josh Berkus wrote:

>> 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
>> TABLE AS, with all the same clauses supported. That includes
>> declaring a materialized view to be temporary or unlogged.
>
> What use would a temporary matview be?

It would be essentially like a temporary table, with all the same
persistence options. I'm not really sure how often it will be more
useful than a temporary table before we have incremental maintenance
of materialized views; once we have that, though, it seems likely
that there could be reasonable use cases.

> Unlogged is good.

I agree that there are likely to be more use cases for this than
temp MVs. Unfortunately, I've had a hard time figuring out how to
flag an MV which is empty because its contents were lost after a
crash with preventing people from using an MV which hasn't been
populated, which has the potential to silently return incorrect
results.

>> 2. MVs don't support inheritance.
>
> In which direction? Can't inherit, or can't be inherited from?

The table inheritance has not been implemented in either direction
for MVs. It didn't seem clear to me that there were reasonable use
cases. Do you see any?

>> 9. MVs can't directly be used in a COPY statement, but can be the
>> source of data using a SELECT.
>
> Hmmm? I don't understand the reason for this.

Consistency. There are other object types which seem to enforce this
rule for no reason that I can see beyond maybe a desire to have both
directions of COPY work with the same set of objects. If I remember
correctly, allowing this would eliminate one line of code from the
patch, so if sentiment is in favor of it, it is very easily done.

>> 13. pg_class now has a relisvalid column, which is true if an MV is
>> truncated or created WITH NO DATA. You can not scan a relation
>> flagged as invalid.
>
> What error would a user see?

I can more directly answer that on Monday. If you enable the body of
the function which makes the relisvalid check you can see the messages.
I commented it out because I have not yet figured out how to suppress
the check for a LOAD MV command.

>> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed
>> to make sense. For example, you can change the tablespace or
>> schema, but you cannot add or drop column with ALTER.
>
> How would you change the definition of an MV then?

At this point you would need to drop and re-create the MV. If we
want to add columns to an MV or change what an existing column holds,
perhaps we could have an ALTER MV which changed the SELECT statement
that populates the MV? I would prefer to leave that to a later patch,
though -- it seems like a bit of a minefield compared to what is
being implemented in this patch.

>> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>> VIEW mat view_name. This seemed more descriptive to me that the
>> alternatives and avoids declaring any new keywords beyond
>> MATERIALIZED. If the MV is flagged as relisvalid == false, this
>> will change it to true.
>
> UPDATE MATERIALIZED VIEW was problematic?

Not technically, really, but I saw two reasons that I preferred LOAD MV:

1. It seems to me to better convey that the entire contents of the MV
  will be built from scratch, rather than incrementally adjusted.
2. We haven't hashed out the syntax for more aggressive maintenance of
  an MV, and it seemed like UPDATE MV might be syntax we would want to
  use for something which updated selected parts of an MV when we do.

> Does LOAD automatically TRUNCATE the view before reloading it? If not,
> why not?

It builds a new heap and moves it into place. When the transaction
running LMV commits, the old heap is deleted. In implementation it is
closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
creating a new table. This allows all permissions, etc., to stay in
place.

>> It would be good to have some discussion to try to reach a consensus
>> about whether we need to differentiate between *missing* datat (where
>> a materialized view which has been loaded WITH NO DATA or TRUNCATEd
>> and has not been subsequently LOADed) and potentially *stale* data.
>> If we don't care to distinguish between a view which generated no
>> rows when it ran and a one for which the query has not been run, we
>> can avoid adding the relisvalid flag, and we could support UNLOGGED
>> MVs. Perhaps someone can come up with a better solution to that
>> problem.
>
> Hmmm. I understand the distinction you're making here, but I'm not sure
> it actually matters to the user. MVs, by their nature, always have
> potentially stale data. Being empty (in an inaccurate way) is just one
> kind of stale data.

Robert feels the same way, but I disagree. Some MVs will not be terribly
volatile. In my view there is a big difference between having a "top ten"
list which might be based on yesterday's base tables rather than the base
table states as of this moment, and having a "top ten" list with no
entries. If you want to, for example, take some action if an order comes
in for one of your top customers, and a different path for other
customers, suddenly treating all of your long-time top customers as not
being so, without any squawk from the database, seems dangerous.

> It would be nice for the user to have some way to know that a matview is
> empty due to never being LOADed or recently being TRUNCATEd. However, I
> don't think that relisvalid flag -- and preventing scanning the relation
> -- is a good solution. What I'd rather have instead is a timestamp of
> when the MV was last LOADed. If the MV was never loaded (or was
> truncated) that timestamp would be NULL. Such a timestamp would allow
> users to construct all kinds of ad-hoc refresh schemes for MVs which
> would not be possible without it.

I see your point there; I'll think about that. My take was more that MVs
would often be refreshed by crontab, and that you would want to keep
subsequent steps from running and generating potentially plausible but
completely inaccurate results if the LMV failed.

> I don't see how this relates to UNLOGGED matviews either way.

UNLOGGED tables and indexes are made empty on crash recovery by copying
the initialization fork over the "normal" relations. Care was taken to
avoid needing to connect to each database in turn to complete that
recovery. This style of recovery can't really set the relisvalid flag, as
far as I can see; which leaves us choosing between unlogged MVs and
knowing whether they hold valid data -- unless someone has a better idea.

-Kevin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2012-11-16 16:30:00 Re: Materialized views WIP patch
Previous Message Alvaro Herrera 2012-11-16 16:17:47 Re: foreign key locks