Re: [PROPOSAL] Temporal query processing with range types

From: Peter Moser <pitiz29a(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Johann Gamper <gamper(at)inf(dot)unibz(dot)it>, Michael Böhlen <boehlen(at)ifi(dot)uzh(dot)ch>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Anton Dignös <anton(dot)dignoes(at)unibz(dot)it>
Subject: Re: [PROPOSAL] Temporal query processing with range types
Date: 2017-02-20 16:38:10
Message-ID: CAHO0eLY0rA7yUrnP2rTwRG2+ep77GM9aU+PhsEaL1GE-Z+_=aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 15, 2017 at 9:33 PM, David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
>> except apparently there's no join type and the optimizer can never
>> reorder these operations with each other or with other joins. Is that
>> right? The optimizer changes in this patch seem fairly minimal, so
>> I'm guessing it can't be doing anything very complex here.
>>
>> + * INPUT:
>> + * (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
>> + * where q can be any join qualifier, and r.ts, r.te, s.ts,
>> and s.t
>> e
>> + * can be any column name.
>> + *
>> + * OUTPUT:
>> + * (
>> + * SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
>> + * FROM
>> + * (
>> + * SELECT *, row_id() OVER () rn FROM r
>> + * ) r
>> + * LEFT OUTER JOIN
>> + * s
>> + * ON q AND r.ts < s.te AND r.te > s.ts
>> + * ORDER BY rn, P1, P2
>> + * ) c
>>
>> It's hard to see what's going on here. What's ts? What's te? If you
>> used longer names for these things, it might be a bit more
>> self-documenting.
>
>
> Just reasoning out loud here...
>
> ISTM ts and te are "temporal [range] start" and "temporal [range] end" (or
> probably just the common "timestamp start/end")
>
> From what I can see it is affecting an intersection of the two ranges and,
> furthermore, splitting the LEFT range into sub-ranges that match up with the
> sub-ranges found on the right side. From the example above this seems like
> it should be acting on self-normalized ranges - but I may be missing
> something by evaluating this out of context.
>
> r1 [1, 6] [ts, te] [time period start, time period end]
> s1 [2, 3]
> s2 [3, 4]
> s3 [5, 7]
>
> r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts)
>
> r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3]
> r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4]
> r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6]
>
> Thus the intersection is [2,6] but since s1 has three ranges that begin
> between 2 and 6 (i.e., 2, 3, and 5) there are three output records that
> correspond to those sub-ranges.

Yes, this is what the internal rewriting produces for r1.
Note that till now we only support half-open ranges, i.e., [), but for
visibility I will continue this example using closed ranges [].
The executor function (ExecTemporalAdjustment) gets this (the output above) as
the input and will then produce:

r1[1, 1]
r1[2, 3]
r1[3, 4]
r1[5, 6]

Which means also for the ALIGN the non-overlapping parts are retained.

>
> The description in the OP basically distinguishes between NORMALIZE and
> ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two
> ranges - discarding the non-overlapping data - while NORMALIZE performs the
> alignment while also retaining the non-overlapping data.

Also for ALIGN we retain the non-overlapping part.
Intersections are symmetric/commutative, so a subsequent outer join can then use
equality on the ranges
to produce join matches (for overlapping) as well as null-extend the
produced non-overlapping parts.

The difference between ALIGN and NORMALIZE is how they split, while ALIGN
produces intersections between pairs of tuples (used for joins) and the
non-overlapping parts, NORMALIZE produces intersections between groups of tuples
(used for aggregation, so that all tuples with the same group have equal
ranges) and the non-overlapping parts.

For instance, the NORMALIZE between r1, s1, s2, and s3 in your example above
would give the following:

r1[1, 1]
r1[2, 2]
r1[3, 3]
r1[4, 4]
r1[5, 6]

>
> The rest of the syntax seems to deal with selecting subsets of range records
> based upon attribute data.

Yes, exactly!

Best regards,
Anton, Johann, Michael, Peter

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Moser 2017-02-20 16:42:49 Re: [PROPOSAL] Temporal query processing with range types
Previous Message Amit Kapila 2017-02-20 16:21:28 Re: [POC] A better way to expand hash indexes.