Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From: Peter Moser <peter(dot)moser(at)unibz(dot)it>
To: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: Anton Dignös <dignoes(at)inf(dot)unibz(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Johann Gamper <gamper(at)inf(dot)unibz(dot)it>, Michael Böhlen <boehlen(at)ifi(dot)uzh(dot)ch>
Subject: Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Date: 2017-11-14 16:19:35
Message-ID: CAHO0eLbhxNrOixgOSGx+H98diWSbH=fvb5tJrZHvpRQfbjGFUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2017-10-06 19:22 GMT+02:00 Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>:
> I just wanted to chime in and say that the work these people have done
> is *amazing*. I read two of their papers yesterday [1, 2], and if you
> are interested in temporal data, I encourage you to read them too. The
> first one is only 12 pages and quite readable. After that the second
> is easy because it covers a lot of the same ground but adds "scaling"
> of values when a tuple is split, and some other interesting points.
> Their contributions could be used to implement SQL:2011 syntax but go
> way beyond that.
>
> Almost every project I work on could use temporal database support,
> but there is nothing available in the Open Source world. The
> temporal_tables extension [3] offers transaction-time support, which
> is great for auditing, but it has no valid-time support (aka
> application-time or state-time). Same with Magnus Hagander's TARDIS
> approach [4], Chronomodel [5] (an extension to the Rails ORM), or any
> other project I've seen. But valid-time is the more valuable
> dimension, because it tells you the history of the thing itself (not
> just when the database was changed). Also nothing is even attempting
> full bitemporal support.
>
> The ideas behind temporal data are covered extensively in Snodgrass's
> 1999 book [6], which shows how valuable it is to handle temporal data
> in a principled way, rather than ad hoc. But that book also
> demonstrates how complex the queries become to do things like temporal
> foreign key constraints and temporal joins. I was sad to learn that
> his proposed TSQL2 was rejected as a standard back in the 90s,
> although the critiques by C. J. Date [7] have some merit. In
> particular, since TSQL2 used *statement* modifiers, some of the
> behavior was unclear or bad when using subqueries, views, and
> set-returning functions. It makes more sense to have temporal
> *operators*, so alongside inner join you have temporal inner join, and
> likewise with temporal left outer join, temporal
> union/intersection/difference, temporal aggregation, etc. (I think the
> drawbacks of TSQL2 came from pursuing an unachievable goal, which was
> to enable seamlessly converting existing non-temporal tables to
> temporal without breaking any queries.)
>
> Another unsatisfactory approach at historical data, from the industry
> rather than academia, is in chapter 4 and elsewhere of Ralph Kimball's
> *Data Warehouse Toolkit* [8]. His first suggestion (Type 1 Dimensions)
> is to ignore the problem and overwrite old data with new. His Type 2
> approach (make a new row) is better but loses the continuity between
> the old row and the new. Type 3 fixes that but supports only one
> change, not several. And anyway his ideas are tailored to star-schema
> designs so are not as broadly useful. Workarounds like bridge tables
> and "put the data in the fact table" are even more wedded to a
> star-schema approach. But I think his efforts do show how valuable
> historical data is, and how hard it is to handle without built-in
> support.
>
> As far as I can tell SQL:2011 avoids the statement modifier problem
> (I'm not 100% sure), but it is quite limited, mostly covering
> transaction-time semantics and not giving any way to do valid-time
> outer joins or aggregations. It is clearly an early first step.
> Unfortunately the syntax feels (to me) crippled by over-specificity,
> like it will have a hard time growing to support all the things you'd
> want to do.
>
> The research by Dignös et al shows how you can define temporal
> variants for every operator in the relational algebra, and then
> implement them by using just two transformations (ALIGN and NORMALIZE)
> combined with the existing non-temporal operators. It has a strong
> theoretical basis and avoids the TSQL2 problems with composability.
> And unlike SQL:2011 it has a great elegance and completeness I haven't
> seen anywhere else.
>
> I believe with range types the approach was to build up useful
> primitives rather than jumping straight to a less-factored full
> implementation of temporal features. (This in spite of SQL:2011
> choosing to model begin/end times as separate columns, not as ranges.
> :-) It seems to me the Dignös work follows the same philosophy. Their
> ALIGN and NORMALIZE could be used to implement SQL:2011 features, but
> they are also useful for much more. In their papers they actually
> suggest that these transformations need not be exposed to end-users,
> although it was convenient to have access to them for their own
> research. I think it'd be great if Postgres's SQL dialect supported
> them though, since SQL:2011 leaves out so much.
>
> Anyway, I wanted to thank them for their excellent work, their
> generosity, and also their perseverance. ([1] is from 2012 and was
> built against Postgres 9.0!) I hope we take their contribution
> seriously, because it would truly move Postgres's temporal support
> beyond any database on the market.

Paul, you are spot on. Your comments are really insightful and the
understanding of the pros and cons of the different solutions is
impressive.

Some additional background about our approach:

During more than 20 years of research in the temporal database area we
have seen many ideas that at the end did not make a real difference for
temporal query processing. Our goal was to identify and implement the
basic functionality that database systems must offer to support the
processing of temporal data. Our answer to this is the adjustment of
time ranges with ALIGN and NORMALIZE. We believe these are general and
useful primitives that will greatly benefit any possible temporal
extension of SQL.

We have been using ALIGN and NORMALIZE since several years in our
teaching at different universities. Students have been using the
patched PostgreSQL kernel and everything (understanding the concepts;
using the primitives in numerous examples) has worked out really well.
As you have noticed the primitives were also well-received in the
flagship outlets of the database research community.

Best regards,
Anton, Johann, Michael, Peter

> [1] https://files.ifi.uzh.ch/boehlen/Papers/modf174-dignoes.pdf
>
> [2] http://www.zora.uzh.ch/id/eprint/130374/1/Extending_the_kernel.pdf
>
> [3] https://pgxn.org/dist/temporal_tables/
>
> [4] https://www.youtube.com/watch?v=TRgni5q0YM8
>
> [5] https://github.com/ifad/chronomodel
>
> [6] *Developing Time-Oriented Database Applications in SQL*,
> downloadable for free from
> https://www2.cs.arizona.edu/~rts/publications.html
>
> [7] http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8CA34414C364C1D859CD0EAE7A714DFF?doi=10.1.1.116.7598&rep=rep1&type=pdf
>
> [8] http://www.dsc.ufcg.edu.br/~sampaio/Livros/alph%20Kimball.%20The%20Data%20Warehouse%20Toolkit..%20The%20Complete%20Guide%20to%20Dimensional%20Modelling%20(Wiley,2002)(ISBN%200471200247)(449s).pdf
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Moser 2017-11-14 16:33:17 Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Previous Message Peter Eisentraut 2017-11-14 16:18:02 plpgsql test layout