|From:||Joe Wildish <joe-postgresql(dot)org(at)elusive(dot)cx>|
|To:||PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Implementing SQL ASSERTION|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Attached is a WIP patch for SQL assertion. I am posting it for anyone who might be interested in seeing it, for comments/feedback, and to see if others are keen to collaborate on taking it further. It is not near production-ready (see thoughts on that below).
The patch builds on the work posted by Peter back in 2013. I've taken his code and updated it to conform to some general changes made to the codebase since then. The bulk of the new work I have done is around when an assertion needs to be checked. Essentially it is an implementation of the algorithm described by Ceri & Widom in "Deriving Production Rules for Constraint Maintenance” — http://infolab.stanford.edu/pub/papers/constraint-maintenance.ps
The general idea is to traverse the expression tree and derive the set of potentially invalidating operations. These operations are used to determine when the constraint trigger fires and causes a re-check. The detail is in the paper but some examples are:
* insertion into the subject of an exists cannot be invalidating;
* deletion from the subject of a not exists cannot be invalidating;
* update of columns in the target list of an exists cannot be invalidating;
* certain combinations of aggregates with comparison operations cannot be invalidating.
As an example of the last point, the expression "CHECK (10 > (SELECT COUNT(*) FROM t))" cannot be invalidated by a delete or an update but can be invalidated by an insert.
I have implemented most of the optimisations mentioned in the paper. There are one or two that I am unsure about, specifically how to deal with set-operations that are the subject of an exists. According to the paper, these are optimisable when they're the subject of an exists, but I think it is only applicable for union and not intersect or except, so I have skipped that particular optimisation for the time being.
The algorithm works under the assumption that when a recheck occurs the previous check result was true (the research report by Ceri & Widom does acknolwedge this assumption). However, unfortunately the SQL specification requires that both true and unknown be valid results for an assertion's check expression. This doesn't play too well with the algorithm so for the time being I have disallowed null. I think the solution here may be that when a null result for a check occurs, the assertion is changed to trigger on all operations against the involved tables; once it returns to true, the triggers can be returned to fire only on the derived invalidating operations. More thought required though. (note: having just written this paragraph, I've realised I can't right now think of a concrete example illustrating the point, so it may be that I'm wrong on this).
The paper does mention a set of optimisations that I have not yet attempted to implement. These are essentially the technique of evaluating the expression against the deltas of a change rather than the full tables. Clearly there is a large overlap with incremental maintainence of views and actually the two authors of the paper have a similiarly named paper called "Deriving Production Rules for Incremental View Maintanence". Although I have yet to finish reviewing all the literature on the subject, I suspect that realistically for this to make it into production, we'd need some implementation of these techniques to make the performance palatable.
|Next Message||Edmund Horner||2018-01-15 01:12:37||Re: PATCH: psql tab completion for SELECT|
|Previous Message||Petr Jelinek||2018-01-14 23:15:58||Re: Logical decoding fast-forward and slot advance|