Re: [PROPOSAL] Temporal query processing with range types

From: Peter Moser <pitiz29a(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Johann Gamper <gamper(at)inf(dot)unibz(dot)it>, Michael Böhlen <boehlen(at)ifi(dot)uzh(dot)ch>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Anton Dignös <anton(dot)dignoes(at)unibz(dot)it>
Subject: Re: [PROPOSAL] Temporal query processing with range types
Date: 2017-02-20 16:42:49
Message-ID: CAHO0eLYyvuqwF=2FsgDn1xOs_NOrFBu9Xh-Wq+aWfFy0y6=jWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2017-02-15 20:24 GMT+01:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
> except apparently there's no join type and the optimizer can never
> reorder these operations with each other or with other joins. Is that
> right? The optimizer changes in this patch seem fairly minimal, so
> I'm guessing it can't be doing anything very complex here.

ALIGN/NORMALIZE operators are aliased from-clause items, which get rewritten
into a subquery using LEFT OUTER JOIN. The main idea behind that
is to reuse as much as possible of the existing PostgreSQL code, and
just provide
one executor function to process the output of these rewritten
queries. The optimizer
code is minimal, because we do not use any new constructs (except the
temporal adjustment
node for align/normalize) due to these rewrites. That is, all needed
optimization
techniques are already present.

> What happens if you perform the ALIGN or NORMALIZE operation using
> something other than an equality operator, like, say, less-than? Or
> an arbitrary user-defined operator.

It is possible to use ALIGN/NORMALIZE with user-defined functions,
and non-equality operators.

> There's no documentation in this patch. I'm not sure you want to go
> to the trouble of writing SGML documentation until this has been
> reviewed enough that it has a real chance of getting committed, but on
> the other hand we're obviously all struggling to understand what it
> does, so I think if not SGML documentation it at least needs a real
> clear explanation of what the syntax is and does in a README or
> something, even just for initial review.

We are currently writing SGML documentation and extend in-code comments.
Both, will be send soon.

> It's hard to see what's going on here. What's ts? What's te? If you
> used longer names for these things, it might be a bit more
> self-documenting.

ts, te describe an half-open interval in which the tuple is considered valid:

[time point start, time point end).

We have added an extended version of comments for both parser functions, i.e.,
transformTemporalAligner and transformTemporalNormalizer. See attached patch
(src/backend/parser/parse_temporal.c).

> If we are going to transform an ALIGN operator in to a left outer
> join, why do we also have an executor node for it?
>
> + fcLowerLarg = makeFuncCall(SystemFuncName("lower"),
> +
> list_make1(crLargTs),
> +
> UNKNOWN_LOCATION);
> + fcLowerRarg = makeFuncCall(SystemFuncName("lower"),
> +
> list_make1(crRargTs),
> +
> UNKNOWN_LOCATION);
> + fcUpperLarg = makeFuncCall(SystemFuncName("upper"),
> +
> list_make1(crLargTs),
> +
> UNKNOWN_LOCATION);
> + fcUpperRarg = makeFuncCall(SystemFuncName("upper"),
> +
> list_make1(crRargTs),
> +
> UNKNOWN_LOCATION);
>
> Why is a temporal operator calling functions that upper-case and
> lower-case strings? In one sense this whole function (and much of the
> nearby code) is very straightforward code and you can see exactly why
> it's doing it. In another sense it's totally inscrutable: WHY is it
> doing any of that stuff?

These functions extract the lower-bound, and upper-bound of range types.

>
> - char *strategy; /* partitioning strategy
> ('list' or 'range') */
> - List *partParams; /* List of PartitionElems */
> - int location; /* token
> location, or -1 if unknown */
> + char *strategy; /* partitioning strategy ('list' or 'range') */
> + List *partParams; /* List of PartitionElems */
> + int location; /* token location, or
> -1 if unknown */
>
> I think this is some kind of mistake on your end while generating the
> patch. It looks like you patched one version of the source code, and
> diffed against another.

Thank you for pointing this out, we fixed it in our new patch.

2017-02-16 13:41 GMT+01:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
> Also, it sounds like all of this is intended to work with ranges that
> are stored in different columns rather than with PostgreSQL's built-in
> range types.

Our syntax supports PostgreSQL's built-in range types and ranges that
are stored in different columns.

For instance, for range types an ALIGN query would look like this:
SELECT * FROM (r ALIGN s ON q WITH (r.t, s.t)) c

... and for ranges in different columns like this:
SELECT * FROM (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c

... where r and s are input relations, q can be any join qualifier, and
r.t, s.t, r.ts, r.te, s.ts, and s.te can be any column name. The
latter represent the valid time intervals, that is time point start,
and time point end of each tuple for each input relation. These can
be defined as four scalars, or two half-open, i.e., [), range typed
values.

Best regards,
Anton, Johann, Michael, Peter

ps. The patch has been rebased on top of
commit 090f21bbad98001979da8589e9647a1d49bce4ee
from "Sun Feb 19 17:18:10 2017 -0500"

Attachment Content-Type Size
tpg_primitives_out_v5.patch text/x-patch 151.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-02-20 16:50:26 Re: Checksums by default?
Previous Message Peter Moser 2017-02-20 16:38:10 Re: [PROPOSAL] Temporal query processing with range types