Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

From: Peter Moser <pitiz29a(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Anton Dignös <dignoes(at)inf(dot)unibz(dot)it>, Johann Gamper <gamper(at)inf(dot)unibz(dot)it>, Michael Böhlen <boehlen(at)ifi(dot)uzh(dot)ch>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Date: 2018-09-07 11:02:26
Message-ID: 87682356-8fdf-217c-bcfb-4d4d91aa6e8c@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/26/2018 07:55 AM, Peter Moser wrote:
> We have now a new approach to plan and execute NORMALIZE as a special
> join node of type NORMALIZE, an append-plan on the inner join path,
> and a merge-join executor. For the latter, we would need to
> extend nodeMergejoin.c with an point-in-range-containment join.

We are ready with a new prototype for the temporal NORMALIZE operation.
In this prototype we do not rewrite queries as in the previous patch,
but have one executor node, that solves the normalize operation. This
executor is based on a merge-join.

Our new patch is based on top of
75f7855369ec56d4a8e7d6eae98aff1182b85cac from September 6, 2018.

The syntax is
SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

It currently is only implemented for empty USING clauses, and solely
int4range as range attributes.

Example:

A=# table r;

a | b | period_r

---+---+----------

a | B | [1,7)

b | B | [3,9)

c | G | [8,10)

(3 rows)

A=# table s;

c | d | period_s

---+---+----------

1 | B | [2,5)

2 | B | [3,4)

3 | B | [7,9)

(3 rows)

A=# SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

period_r | a | b

----------+---+---

[1,2) | a | B

[2,3) | a | B

[3,4) | a | B

[4,5) | a | B

[5,7) | a | B

[3,4) | b | B

[4,5) | b | B

[5,7) | b | B

[7,9) | b | B

(9 rows)

A=# EXPLAIN SELECT * FROM (r NORMALIZE s USING() WITH(period_r,
period_s)) c;
QUERY PLAN

--------------------------------------------------------------------------

Result (cost=2.15..2.22 rows=3 width=18)

-> Merge ??? Join (cost=2.15..2.23 rows=3 width=22)

Merge Cond: (r.period_r @> (range_split(s.period_s)))

-> Sort (cost=1.05..1.06 rows=3 width=18)

Sort Key: r.period_r

-> Seq Scan on r (cost=0.00..1.03 rows=3 width=18)

-> Sort (cost=1.09..1.10 rows=3 width=4)

Sort Key: (range_split(s.period_s)) USING <

-> ProjectSet (cost=0.00..1.07 rows=3 width=4)

-> Seq Scan on s (cost=0.00..1.03 rows=3
width=14)
(10 rows)

> That is, we create a new join path within sort_inner_and_outer
> (joinpath.c). First two projection nodes to extract the start- and
> end-timepoints of the range type used as interval, and above an
> append-plan to merge both subplans. In detail, outer_path is just sort,
> whereas inner_path is append of (B, ts) projection with (B, te)
> projection.

We changed this implementation and use a set-returning function called
"range_split", that extracts the upper and lower bound of a range and
returns two tuples. For instance, a tuple '[4,10),a' becomes two tuples
of the form '4,a' and '10,a'.

> Hereby, B is a set of non-temporal attributes used in join equality
> predicates, and [ts,te) forms the valid-time interval. Non-equality
> predicates must be handled separately as a filter step.

The current prototype supports only an integer range-type without any
additional non-temporal attributes (empty USING clause).

> Do you think, it is a good idea to extend the sort_inner_and_outer()
> with a new branch, where jointype == NORMALIZE and add the projection
> and append sub-paths there?

We actually extended sort_inner_and_outer now. It is an early solution,
to support discussions. Please see the two sections starting with "if
(jointype == JOIN_TEMPORAL_NORMALIZE)" inside sort_inner_and_outer:

The purpose of these sections is to change the inner path's range type
into its single bounds.

We accomplish this with a new function called range_split. We take the
inner clause and extract the second operator of an RANGE_EQ expression
out of it. We assume *for this prototype*, that their is only one such
operator and that it is solely used for NORMALIZE. Then, we replace it
with range_split. A range split returns a set of tuples, hence we add a
new "set projection path" above the inner path, and another sort path
above that.

What we like to discuss now is:
- Is sort_inner_and_outer the correct place to perform this split?
- How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE
mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is
not an equality operator, but if we use RANGE_EQ it assumes that the
right-hand-side of the operator must be a range as well.
- Should we better change our mergeclause to a RANGE_ELEM_CONTAINED
comparison, or keep RANGE_EQ and fix pathkeys later?
- How do we update equivalence classes, pathkeys, and any other struct,
when changing the inner relation's data type from "int4range" to "int"
in the query tree inside "sort_inner_and_outer" to get the correct
ordering and data types

Best regards,
Anton, Johann, Michael, Peter

Attachment Content-Type Size
tpg_normalize_v2.patch text/x-patch 47.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adrien NAYRAT 2018-09-07 12:16:46 Standby reads fail when autovacuum take AEL during truncation
Previous Message Rajkumar Raghuwanshi 2018-09-07 10:32:13 cache lookup failed for constraint when alter table referred by partition table