Re: [PROPOSAL] Temporal query processing with range types

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mike Rylander <mrylander(at)gmail(dot)com>
Cc: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>, Anton Dignös <dignoes(at)inf(dot)unibz(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Johann Gamper <gamper(at)inf(dot)unibz(dot)it>, Michael Böhlen <boehlen(at)ifi(dot)uzh(dot)ch>, Moser Peter <peter(dot)moser(at)unibz(dot)it>
Subject: Re: [PROPOSAL] Temporal query processing with range types
Date: 2017-11-11 12:19:26
Message-ID: CA+Tgmoa8qxsT51fF2d7dsB2cZsJRY0=6H=_KV+u9P_SJopGSLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 6, 2017 at 3:22 PM, Mike Rylander <mrylander(at)gmail(dot)com> wrote:
> I've also been following this feature with great interest, and would
> definitely throw whatever tiny weight I have, sitting out here in the
> the peanut gallery, behind accepting the ALIGN and NORMALIZE syntax.
> I estimate that about a third of the non-trivial queries in the
> primary project I work on (and have, on Postgres, for the last 13+
> years) would be simpler with support of the proposed syntax, and some
> of the most complex business logic would be simplified nearly to the
> point of triviality.

This is really good input. If the feature weren't useful, then it
wouldn't make sense to try to figure out how to integrate it, but if
it is, then we should try.

I don't think that implementing a feature like this by SQL
transformation can work. It's certainly got the advantage of
simplicity of implemention, but there are quite a few things that seem
like they won't always work correctly. For instance:

+ * INPUT:
+ * (r ALIGN s ON q WITH (r.t, s.t)) c
+ *
+ * where r and s are input relations, q can be any
+ * join qualifier, and r.t, s.t can be any column name. The latter
+ * represent the valid time intervals, that is time point start,
+ * and time point end of each tuple for each input relation. These
+ * are two half-open, i.e., [), range typed values.
+ *
+ * OUTPUT:
+ * (
+ * SELECT r.*, GREATEST(LOWER(r.t), LOWER(s.t)) P1,
+ * LEAST(UPPER(r.t), UPPER(s.t)) P2
+ * FROM
+ * (
+ * SELECT *, row_id() OVER () rn FROM r
+ * ) r
+ * LEFT OUTER JOIN
+ * s
+ * ON q AND r.t && s.t
+ * ORDER BY rn, P1, P2
+ * ) c

One problem with this is that we end up looking up functions in
pg_catalog by name: LOWER, UPPER, LEAST, GREATEST. In particular,
when we do this...

+ fcUpperRarg = makeFuncCall(SystemFuncName("upper"),
+ list_make1(crRargTs),
+ UNKNOWN_LOCATION);

...we're hoping and praying that we're going to latch onto the first of these:

rhaas=# \df upper
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+-------+------------------+---------------------+--------
pg_catalog | upper | anyelement | anyrange | normal
pg_catalog | upper | text | text | normal
(2 rows)

But that's only true as long as there isn't another function in
pg_catalog with a match to the specific range type that is being used
here, and there's nothing to stop a user from creating one, and then
their query, which does not anywhere in its query text mention the
name of that function, will start failing. We're not going to accept
that limitation. Looking up functions by name rather than by OID or
using an opclass or something is pretty much a death sentence for a
core feature, and this patch does a lot of it.

A related problem is that, because all of this transformation is being
done in the parser, when you use this temporal syntax to create a
view, and then run pg_dump to dump that view, you are going to (I
think) get the transformed version, not the original. Things like
transformTemporalClause are going to have user-visible effects: the
renaming you do there will (I think) be reflected in the deparsed
output of views. That's not good. Users have a right to expect that
what comes out of deparsing will at least resemble what they put in.

Error reporting might be a problem too: makeTemporalQuerySkeleton is
creating parse nodes that have no location, so if an error develops at
that point, how will the user correlate that with what they typed in?

I suspect there are also problems with plan invalidation. Any
decisions we make at parse time are fixed forever. DDL changes can
force a re-plan, but not a re-parse.

Overall, I think that the whole approach here probably needs to be
scrapped and rethought. The stuff this patch is doing really belongs
in the optimizer, not the parser, I think. It could possibly happen
at a relatively early stage in the optimizer so that the rest of the
optimizer can see the results of the transformation and, well,
optimize. But parse time is way too early.

Unrelated to the above, this patch introduces various kinds of helper
functions which are general-purpose in function but dumped in with the
temporal support because it happens to use them. For instance:

+static Form_pg_type
+typeGet(Oid id)
+{
+ HeapTuple tp;
+ Form_pg_type typtup;
+
+ tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(id));
+ if (!HeapTupleIsValid(tp))
+ ereport(ERROR,
+ (errcode(ERROR),
+ errmsg("cache lookup failed for type %u", id)));
+
+ typtup = (Form_pg_type) GETSTRUCT(tp);
+ ReleaseSysCache(tp);
+ return typtup;
+}

A function as general as typeGet() certainly does not belong in
parse_clause.c in the middle of a long list of temporal functions.
This particular function is also a bad idea in general, because typtup
is only a valid pointer until ReleaseSysCache() is called. This will
appear to work normally because, normally, no relevant cache
invalidation messages will show up at the wrong time, but if one does
then typtup will be pointing off into outer space. You can find bugs
like this by testing with CLOBBER_CACHE_ALWAYS defined (warning: this
makes things very slow). But the general point I want to make here is
actually about inventing your own idioms vs. using the ones we've got
-- if you're going to invent new general-purpose primitives, they need
to go next to the existing functions that do similar things, not in
whatever part of the code you first decided you needed them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-11-11 12:19:56 Re: parallelize queries containing initplans
Previous Message Robert Haas 2017-11-11 11:35:38 Re: GatherMerge misses to push target list