Re: [PROPOSAL] Temporal query processing with range types

From: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: Anton Dignös <dignoes(at)inf(dot)unibz(dot)it>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Johann Gamper <gamper(at)inf(dot)unibz(dot)it>, Michael Böhlen <boehlen(at)ifi(dot)uzh(dot)ch>, Moser Peter <peter(dot)moser(at)unibz(dot)it>
Subject: Re: [PROPOSAL] Temporal query processing with range types
Date: 2017-10-06 17:22:44
Message-ID: CA+renyX6cws7DpJUerAvfAv2iNgf2eCRmFwWBPa6Ci_PHHepvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 22, 2016 at 4:15 AM, Anton Dignös <dignoes(at)inf(dot)unibz(dot)it> wrote:
> We would like to contribute to PostgreSQL a solution that supports the query
> processing of "at each time point". The basic idea is to offer two new
> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
> ranges of tuples so that subsequent queries can use the usual grouping and
> equality conditions to get the intended results.

I just wanted to chime in and say that the work these people have done
is *amazing*. I read two of their papers yesterday [1, 2], and if you
are interested in temporal data, I encourage you to read them too. The
first one is only 12 pages and quite readable. After that the second
is easy because it covers a lot of the same ground but adds "scaling"
of values when a tuple is split, and some other interesting points.
Their contributions could be used to implement SQL:2011 syntax but go
way beyond that.

Almost every project I work on could use temporal database support,
but there is nothing available in the Open Source world. The
temporal_tables extension [3] offers transaction-time support, which
is great for auditing, but it has no valid-time support (aka
application-time or state-time). Same with Magnus Hagander's TARDIS
approach [4], Chronomodel [5] (an extension to the Rails ORM), or any
other project I've seen. But valid-time is the more valuable
dimension, because it tells you the history of the thing itself (not
just when the database was changed). Also nothing is even attempting
full bitemporal support.

The ideas behind temporal data are covered extensively in Snodgrass's
1999 book [6], which shows how valuable it is to handle temporal data
in a principled way, rather than ad hoc. But that book also
demonstrates how complex the queries become to do things like temporal
foreign key constraints and temporal joins. I was sad to learn that
his proposed TSQL2 was rejected as a standard back in the 90s,
although the critiques by C. J. Date [7] have some merit. In
particular, since TSQL2 used *statement* modifiers, some of the
behavior was unclear or bad when using subqueries, views, and
set-returning functions. It makes more sense to have temporal
*operators*, so alongside inner join you have temporal inner join, and
likewise with temporal left outer join, temporal
union/intersection/difference, temporal aggregation, etc. (I think the
drawbacks of TSQL2 came from pursuing an unachievable goal, which was
to enable seamlessly converting existing non-temporal tables to
temporal without breaking any queries.)

Another unsatisfactory approach at historical data, from the industry
rather than academia, is in chapter 4 and elsewhere of Ralph Kimball's
*Data Warehouse Toolkit* [8]. His first suggestion (Type 1 Dimensions)
is to ignore the problem and overwrite old data with new. His Type 2
approach (make a new row) is better but loses the continuity between
the old row and the new. Type 3 fixes that but supports only one
change, not several. And anyway his ideas are tailored to star-schema
designs so are not as broadly useful. Workarounds like bridge tables
and "put the data in the fact table" are even more wedded to a
star-schema approach. But I think his efforts do show how valuable
historical data is, and how hard it is to handle without built-in
support.

As far as I can tell SQL:2011 avoids the statement modifier problem
(I'm not 100% sure), but it is quite limited, mostly covering
transaction-time semantics and not giving any way to do valid-time
outer joins or aggregations. It is clearly an early first step.
Unfortunately the syntax feels (to me) crippled by over-specificity,
like it will have a hard time growing to support all the things you'd
want to do.

The research by Dignös et al shows how you can define temporal
variants for every operator in the relational algebra, and then
implement them by using just two transformations (ALIGN and NORMALIZE)
combined with the existing non-temporal operators. It has a strong
theoretical basis and avoids the TSQL2 problems with composability.
And unlike SQL:2011 it has a great elegance and completeness I haven't
seen anywhere else.

I believe with range types the approach was to build up useful
primitives rather than jumping straight to a less-factored full
implementation of temporal features. (This in spite of SQL:2011
choosing to model begin/end times as separate columns, not as ranges.
:-) It seems to me the Dignös work follows the same philosophy. Their
ALIGN and NORMALIZE could be used to implement SQL:2011 features, but
they are also useful for much more. In their papers they actually
suggest that these transformations need not be exposed to end-users,
although it was convenient to have access to them for their own
research. I think it'd be great if Postgres's SQL dialect supported
them though, since SQL:2011 leaves out so much.

Anyway, I wanted to thank them for their excellent work, their
generosity, and also their perseverance. ([1] is from 2012 and was
built against Postgres 9.0!) I hope we take their contribution
seriously, because it would truly move Postgres's temporal support
beyond any database on the market.

Yours,
Paul

[1] https://files.ifi.uzh.ch/boehlen/Papers/modf174-dignoes.pdf

[2] http://www.zora.uzh.ch/id/eprint/130374/1/Extending_the_kernel.pdf

[3] https://pgxn.org/dist/temporal_tables/

[4] https://www.youtube.com/watch?v=TRgni5q0YM8

[5] https://github.com/ifad/chronomodel

[6] *Developing Time-Oriented Database Applications in SQL*,
downloadable for free from
https://www2.cs.arizona.edu/~rts/publications.html

[7] http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8CA34414C364C1D859CD0EAE7A714DFF?doi=10.1.1.116.7598&rep=rep1&type=pdf

[8] http://www.dsc.ufcg.edu.br/~sampaio/Livros/alph%20Kimball.%20The%20Data%20Warehouse%20Toolkit..%20The%20Complete%20Guide%20to%20Dimensional%20Modelling%20(Wiley,2002)(ISBN%200471200247)(449s).pdf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-10-06 17:49:56 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Peter Geoghegan 2017-10-06 17:03:51 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple