Re: counting algorithm for incremental matview maintenance

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: counting algorithm for incremental matview maintenance
Date: 2013-05-17 20:55:24
Message-ID: 1368824124.33398.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:
> 2013/5/17 Kevin Grittner <kgrittn(at)ymail(dot)com>:
>> Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> wrote:
>>
>>> Note that the basic count algorithm assumes real-serializable
>>> transactions for correctness. Example:
>
> [..]
>
>> Good point.
>>
>> It might be hard to detect when this type of race condition exists,
>> since it could be hidden inside a function.  Any thoughts on that?
>
> I think that matviews with defining queries that use functions for
> which the system can not prove whether they are safe for the given
> incremental maintenance technique (e.g., count algorithm) and
> situation (e.g., mandatory isolation level), should not be allowed to
> use incremental maintenance of that type.

It's clear enough that any IMMUTABLE function should be safe, but
for anything less I don't see how we could trust it short of having
a "white list" of which functions are known to be safe, or creating
a new property of for functions.  Not something I want to require
up front; maybe later....

> Although I have only limited practical matview experience (with SQL
> Server (2005?) only, there it is called “indexed views”): I assume
> that in most other DBMSs, the queries that can be used for defining
> incrementally maintained matviews are very limited (they are in SQL
> Server).

In researching this feature I have found a few comments by Oracle
users that when incremental maintenance was first added (in 1999)
the queries which it could support were severely limited, but that
they have expanded coverage with each major release.  People don't
seem to have too many complaints about what is covered now;
although how much of that is adapting to the limits and how much is
due to expansion of the limits is not clear.  You would think that
somewhere they would define what the limits for a given version
are, but I have not stumbled across anything like that.

>>> Each of the following would fix this problem:
>>>
>>> (2) The matview’s defining query and the corresponding base tables
>>> must adhere to certain (very commonly satisfied) conditions (for
>>> example, if there is an FK between A.a and B.b, the example cannot be
>>> made to fail; The FK must not be deferred in the case of
>>> right-after-each-statement matview maintenance).
>>
>> Agreed.  That would cover a high percentage of cases regardless of
>> transaction isolation level.  Knowing when you needed such a
>> constraint to make incremental maintenance safe would be hard,
>> though.
>
> Reasoning about it becomes easier when you have a clean definition of
> which queries you accept, and which ones you don’t (see below). Then
> you can look at your definition and say “I allow X because it is
> always correct  I don’t allow Y because I didn’t prove it to be always
> correct (yet).”

Agreed.  I'm just haven't figured out precisely how concurrency
issues adjust those boundaries yet.

>>> (3) The count algorithm must be implemented in a way that understands
>>> MVCC internals: Reading the base tables must be done using a technique
>>> that reads all rows (i.e., also the ones not visible to the current
>>> transaction), and the xmin/xmax/etc of the updated rows in the matview
>>> must be set in a smart way, so as to yield visibility results that
>>> correspond to the visibility of the “originating” rows in the base
>>> tables. Calculation of the matview deltas (especially the part where
>>> the base tables are inspected for joining rows) must in this case
>>> probably be done in a serialized manner.
>>
>> I will look at this option some more, but on the face of it, I'm
>> not quite following how it would work; and it sounds invasive and
>> fragile. If you know of any paper on this approach you could point
>> me to, or if you could sketch it out in a little more detail, I
>> would appreciate that.
>
> Sorry, these are my own musings: I have never found any paper
> about it.

> [musings]

Will review and ponder these and what Claudio offered, and see
where that takes me.

Thanks to both of you for the input!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2013-05-17 21:03:20 Re: askpass program for libpq
Previous Message Merlin Moncure 2013-05-17 20:48:38 Re: fallocate / posix_fallocate for new WAL file creation (etc...)