Re: [PROPOSAL] Temporal query processing with range types

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Moser <pitiz29a(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Johann Gamper <gamper(at)inf(dot)unibz(dot)it>, Michael Böhlen <boehlen(at)ifi(dot)uzh(dot)ch>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Anton Dignös <anton(dot)dignoes(at)unibz(dot)it>
Subject: Re: [PROPOSAL] Temporal query processing with range types
Date: 2017-02-15 20:33:52
Message-ID: CAKFQuwY3zTTvYySUswWnLL_EL01vevdNfXLn6a0AeGqOS-_Pdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <pitiz29a(at)gmail(dot)com> wrote:
> >> Using common terms such as ALIGN and NORMALIZE for such a specific
> >> functionality seems a bit wrong.
> >
> > Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be
> better
> > options? We are also thankful for any suggestion or comments about the
> syntax.
>
> So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
> except apparently there's no join type and the optimizer can never
> reorder these operations with each other or with other joins. Is that
> right? The optimizer changes in this patch seem fairly minimal, so
> I'm guessing it can't be doing anything very complex here.
>
> + * INPUT:
> + * (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
> + * where q can be any join qualifier, and r.ts, r.te, s.ts,
> and s.t
> e
> + * can be any column name.
> + *
> + * OUTPUT:
> + * (
> + * SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
> + * FROM
> + * (
> + * SELECT *, row_id() OVER () rn FROM r
> + * ) r
> + * LEFT OUTER JOIN
> + * s
> + * ON q AND r.ts < s.te AND r.te > s.ts
> + * ORDER BY rn, P1, P2
> + * ) c
>
> It's hard to see what's going on here. What's ts? What's te? If you
> used longer names for these things, it might be a bit more
> self-documenting.
>

Just reasoning out loud here...​

ISTM ts and te are "temporal [range] start" and "temporal [range] end"​ (or
probably just the common "timestamp start/end")

​From what I can see it is affecting an intersection of the two ranges and,
furthermore, splitting the LEFT range into sub-ranges that match up with
the sub-ranges found on the right side. From the example above this seems
like it should be acting on self-normalized ranges - but I may be missing
something by evaluating this out of context.

r1 [1, 6] [ts, te] [time period start, time period end]
s1 [2, 3]
s2 [3, 4]
s3 [5, 7]

r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts)

r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3]
r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4]
r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6]

Thus the intersection is [2,6] but since s1 has three ranges that begin
between 2 and 6 (i.e., 2, 3, and 5) there are three output records that
correspond to those sub-ranges.

The description in the OP basically distinguishes between NORMALIZE and
ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two
ranges - discarding the non-overlapping data - while NORMALIZE performs the
alignment while also retaining the non-overlapping data.

The rest of the syntax seems to deal with selecting subsets of range
records based upon attribute data.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karl O. Pinc 2017-02-15 20:53:34 Re: Patch to implement pg_current_logfile() function
Previous Message Jeff Janes 2017-02-15 20:25:05 Re: operator_precedence_warning vs make installcheck