Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Moser <pitiz29a(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Date: 2017-11-30 17:26:50
Message-ID: CA+TgmoY8bhOGKL6v-gdKc4MRaNbFn46ddn4gXvK15f1VVB2Lrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 21, 2017 at 4:36 AM, Peter Moser <pitiz29a(at)gmail(dot)com> wrote:
> Hi hackers,
> we like to rethink our approach...
>
> For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE:
>
> SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c;
>
> Our normalization executor node needs the following input (for now
> expressed in plain SQL):
>
> SELECT R.*, p1
> FROM (SELECT *, row_id() OVER () rn FROM R) R
> LEFT OUTER JOIN (
> SELECT y, LOWER(time) p1 FROM S
> UNION
> SELECT y, UPPER(time) p1 FROM S
> ) S
> ON R.x = S.y AND p1 <@ R.time
> ORDER BY rn, p1;
>
> In other words:
> 1) The left subquery adds an unique ID to each tuple (i.e., rn).
> 2) The right subquery creates two results for each input tuple: one for
> the upper and one for the lower bound of each input tuple's valid time
> column. The boundaries get put into a single (scalar) column, namely p1.
> 3) We join both subqueries if the normalization predicates hold (R.x = S.y)
> and p1 is inside the time of the current outer tuple.
> 4) Finally, we sort the result by the unique ID (rn) and p1, and give all
> columns of the outer relation, rn and p1 back.

So, if I understand correctly, there are three possible outcomes. If
S.time and R.time are non-overlapping, then a null-extended row is
produced with the original data from R. If S.time overlaps R.time but
is not contained within it, then this produces one result row, not
null-extended -- either the upper or lower bound will found to be
contained within R.time, but the other will not. If S.time overlaps
R.time but is not contained within it, then this produces 2 result
rows, one with p1 containing S.time's lower bound and one with p1
containing S.time's upper bound. Is that right? Then, after all this
happens, the temporal normalizer code runs over the results and uses
the p1 values as split points for the ranges from R.

I wonder if we could instead think about R NORMALIZE S ON R.x = S.y
WITH (R.time, S.time) as an ordinary join on R.x = S.y with some extra
processing afterwards. After finding all the join partners for a row
in R, extract all the lower and upper bounds from the S.time fields of
the join partners and use that to emit as many rows from R as may be
necessary. The main problem that I see with that approach is that it
seems desirable to separate the extra-processing-afterwards step into
a separate executor node, and there's no way for a separate type of
plan node to know when the join advances the outer side of the join.
That's the purpose that row_id() is serving as you have it; we'd have
to invent some other kind of signalling mechanism, which might not be
entirely trivial. :-(

If we could do it, though, it might be quite a bit more efficient,
because it would avoid scanning S twice and performing a UNION on the
results of those scans. Also, we wouldn't necessarily need to sort
the whole set of rows from R, which I suspect is unavoidable in your
implementation. We'd just need to sort the individual groups of rows
from S, and my guess is that in many practical cases those groups are
fairly small.

I wonder what identities hold for NORMALIZE. It does not seem to be
commutative, but it looks like it might be associative... i.e. (R
NORMALIZE S) NORMALIZE T produces the same output as R NORMALIZE (S
NORMALIZE T), perhaps?

> Our first attempt to understand the new approach would be as follows: The
> left base rel of the inner left-outer-join can be expressed as a WindowAgg
> node. However, the right query of the join is much more difficult to build
> (maybe through hash aggregates). Both queries could be put together with a
> MergeJoin for instance. However, if we create the plan tree by hand and
> choose algorithms for it manually, how is it possible to have it optimized
> later? Or, if that is not possible, how do we choose the best algorithms
> for it?

You don't want to generate plan nodes directly like this, I think.
Generally, you create paths, and the cheapest path wins at the end of
planning. The thing that makes this tricky is that the temporal
normalization phase can be separated from the inner left-outer-join by
other joins, and the planner doesn't really have a good way of
representing that concept.

More broadly, the very idea of creating plan nodes suggests that you
are approaching this from the point of view of sticking the logic in
near the end of the planning process, which I think is not going to
work. Whatever is going to happen here needs to happen at path
generation time at the latest, or maybe even earlier, before the
optimizer even starts processing the join tree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-11-30 17:26:55 Re: pl/perl extension fails on Windows
Previous Message Robert Haas 2017-11-30 16:11:07 Re: Protect syscache from bloating with negative cache entries