================================================================================ NORMALIZE ================================================================================ This is an exemplary walkthrough of a temporal NORMALIZE operation, from the query input, over the query rewrite, until the result tuple outputs during execution. EXAMPLE -------------------------------------------------------------------------------- Question: How many employees work in a department at each time point? CREATE TABLE empl (name VARCHAR, dept VARCHAR, time INT4RANGE); INSERT INTO empl VALUES ('Ann', 'DB', '[1,5)'), ('Ann', 'AI', '[3,8)'), ('Bea', 'DB', '[3,9)'); Timeline representation: NAME DEPT TIME Ann DB [1,5) ---- <- Tuple 1, valid from 1 to 5 excl. Ann AI [3,8) ----- <- Tuple 2 Bea DB [3,9) ------ <- Tuple 3 123456789 <- Timeline NORMALIZE query: SELECT count(*), dept, time FROM ( empl a NORMALIZE empl b USING(dept) WITH (a.time, b.time) ) c GROUP BY dept, time; Result: COUNT DEPT TIME 1 DB [1,3) -- 1 AI [3,8) ----- 2 DB [3,5) -- 1 DB [5,9) ---- 123456789 <- Timeline STEP-BY-STEP EXPLANATION -------------------------------------------------------------------------------- After receiving the NORMALIZE query (see above) as input, we rewrite it into the following query: SELECT a.*, P1 FROM ( SELECT *, row_id() OVER () rn FROM empl ) a LEFT OUTER JOIN ( SELECT dept, LOWER(time) P1 FROM empl UNION SELECT dept, UPPER(time) P1 FROM empl ) b ON a.dept = b.dept AND P1 <@ a.time ORDER BY rn, P1; Intermediate result: This is the input of our NORMALIZE execution node... (See Appendix [1] for details) TUPID name | dept | time | rn | p1 ------+------+-------+----+---- A Ann | DB | [1,5) | 1 | 1 B Ann | DB | [1,5) | 1 | 3 C Ann | AI | [3,8) | 2 | 3 D Bea | DB | [3,9) | 3 | 3 E Bea | DB | [3,9) | 3 | 5 We have three possibilities inside NORMALIZE during execution: (See Appendix [2] for details) 1) CURR == PREV a) S < CURR.P1 --> generate (CURR, [S, CURR.P1)) and set S = CURR.P1 b) otherwise --> fetch next tuple if fetched tuple is NULL --> goto(2) forcing CURR != PREV to be true 2) CURR != PREV a) S < PREV.te --> generate (PREV, [S, PREV.te)) ... set S = CURR.ts and goto(1) forcing CURR == PREV to be true Executor steps for our example query: 1) First call of ExecTemporalAdjustment: Fetch tuple A, which is now CURR. Set sweepline S = 1 (A.ts), and copy tuple A also into PREV. Goto(1) forcing CURR == PREV to be true. 2) (1): CURR == PREV, that is, tuple A is equal to itself. (1b): Fetch tuple B; CURR = B, PREV = A 3) (1): A == B (1a): S < B.P1 (1 < 3), hence we generate a result tuple: (Ann, DB, [1,3)) ...where the interval is [S, CURR.P1) and all other attributes are copied from the current tuple (omitting helper columns: RN and P1) We update the sweepline: S = 3 (B.P1). 4) (1): A == B. (1b): Fetch tuple C; CURR = C, PREV = B 5) (2): B != C. (2a): S < B.te (3 < 5), hence we generate a result tuple: (Ann, DB, [3, 5)) We update the sweepline: S = 3 (C.ts). 6) (1): B == C (forced). (1b): Fetch tuple D; CURR = D, PREV = C 7) (2): C != D. (2a): S < C.te (3 < 8), hence we generate a result tuple: (Ann, AI, [3,8)) We update the sweepline: S = 3 (D.ts). 8) (1): C == D (forced). (1b): Fetch tuple E; CURR = E, PREV = D 9) (1): D == E. (1a): S < E.P1 (3 < 5), hence we generate a result tuple: (Bea, DB, [3, 5)) We update the sweepline: S = 5 (E.P1). 10) (1): D == E. (1b): Fetch tuple gives NULL 11) (2): D != E (forced). (2a): S < E.te (5 < 9), hence we generate a result tuple: (Bea, DB, [5, 9)) We update the sweepline: S = 3 (E.ts). No new tuple can be fetched, nor is any new result tuple produced, hence we terminate our executor. The result of NORMALIZE is as follows: name | dept | time ------+------+------- Ann | DB | [1,3) Ann | DB | [3,5) Ann | AI | [3,8) Bea | DB | [3,5) Bea | DB | [5,9) The initial goal was to aggregate over "dept" and "time", and output the count of each. These can now be easily done from this normalized input with standard (reused) PostgreSQL operators. Final result is: count | dept | time -------+------+------- 1 | DB | [1,3) 1 | AI | [3,8) 2 | DB | [3,5) 1 | DB | [5,9) Best regards, Anton, Michael, Johann, Peter -------------------------------------------------------------------------------- Appendix: [1] After the intermediate result we get an input, which is splitted into so-called temporal groups. Each of these groups satisfy the USING-condition (a.dept = b.dept), has overlapping periods, and a row number as ID (rn). The input is ordered by temporal group ID, and the start and ending timepoints, i.e., P1. Temporal normalizers do not make a distinction between start and end timepoints while grouping, therefore we have only one timepoint attribute here (i.e., P1), which is the union of start and end timepoints. Please note, that each tuple in a certain rn ID is always equal to any other tuple in that rn ID. [2] Additinal information for NORMALIZE: - sweepline: to sweep from left-to-right on the time line (S) - slots to the current and previous tuple (CURR, PREV) - Two tuples A and B are equal, i.e. A == B, if A.rn == B.rn - ts means LOWER(time), and te means UPPER(time)