Re: Watching for view changes

From: George Neuner <gneuner2(at)comcast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Watching for view changes
Date: 2018-12-23 09:00:19
Message-ID: l1gu1e9pova7q543j5bcs0o0i1njm5io09@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 21 Dec 2018 23:41:16 -0800, Mitar <mmitar(at)gmail(dot)com> wrote:

>Hi!
>
>On Fri, Dec 21, 2018 at 11:10 PM George Neuner <gneuner2(at)comcast(dot)net> wrote:
>> A materialized view IS exactly such a deliberate cache of results from
>> applying a view. It is a real table that can be monitored for changes
>> using INSERT, UPDATE and/or DELETE triggers.
>
>Caching is needed if you want to compute a difference between previous
>version and new. But if you want to just know new value, then I could
>imagine that (a simple implementation would) on every change to any
>underlying table check if this change matches selectors of the query
>and if such apply its operations/projections and produce the new
>value.

Yes, that could be done. But it isn't. In effect you are asking the
DBMS also to be a spreadsheet: i.e. change this cell and everything
that depends on it gets recomputed.

A spreadsheet is an order of magnitude simpler to implement than a
DBMS, but the combination would be an order of magnitude (or more)
harder. Even object graph databases don't offer spreadsheet
functionality, and it would be a lot easier to do there than in a
table relational system.

>So yes, you need caching if you want to decrease CPU use, but you
>could also see it as new values being computed again and again through
>query. Would such caching you are mentioning really improve
>performance, I do not know, so it might be premature optimization?

It may take only 3 cycles to multiply two numbers, but it can take
thousands of cycles [or millions if the data is on disk] to get those
two numbers into the multiplier.

There always are exceptions, but the general rule is that whenever the
result requires:
- significant computation,
- significant resources, or
- significant time
then you should cache the result instead of recomputing it.

Joins and sorts can take a whole lot of memory (and spill onto disk if
they overflow the work buffer). A fetch of a table or index not in
memory is simple but takes a lot of time - as well as maybe pushing
something else out (increasing the complexity of a competing query).

>If we do go down the cache path, then I agree, materialized views
>seems nice, but I would prefer temporary materialized views: they
>should be cleaned up at the end of the session. Moreover, they should
>be ideally just in memory, not really on disk. Materialized views are
>currently stored to disk, no?

In PG, all *named* tables are backed on disk - even temporary tables.
Only anonymous tables of query results can exist entirely in memory
[and even they can spill onto disk when necessary].

With enough memory you can cache all your tables in shared buffers and
have enough extra that you never run out of work buffers and never
overflow a work buffer. But that is the best you can achieve with PG.

George

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Istvan Soos 2018-12-23 13:39:14 logical replication resiliency
Previous Message Mitar 2018-12-23 06:09:49 Re: Watching for view changes