================================================================================ ALIGN ================================================================================ This is an exemplary walkthrough of a temporal ALIGN operation, from the query input, over the query rewrite, until the result tuple outputs during execution. EXAMPLE ================================================================================ These tables represent employees and projects in a company. An employee works for a department over a certain time period. Different projects get developed in a department. The start and end time points of each project is stored in the period column. CREATE TABLE emp (empname VARCHAR, dept VARCHAR, period INT4RANGE); INSERT INTO emp VALUES ('Ann', 'DB', '[1,5)'), ('Bea', 'DB', '[3,8)'), ('Ron', 'AI', '[6,9)'); CREATE TABLE prj (prjname VARCHAR, dept VARCHAR, period INT4RANGE); INSERT INTO prj VALUES ('PR1', 'DB', '[3,7)'), ('PR2', 'DB', '[1,5)'), ('PR3', 'HW', '[2,8)'); Timeline representation ----------------------- EMPNAME DEPT PERIOD Ann DB [1,5) ---- <- Tuple 1, valid from 1 to 5 excl. Bea DB [3,8) ----- <- Tuple 2 Ron AI [6,9) --- <- Tuple 3 123456789 <- Timeline PRJNAME DEPT PERIOD PR1 DB [3,7) ---- <- Tuple 1, valid from 3 to 7 excl. PR2 DB [1,5) ---- <- Tuple 2 PR3 HW [2,8) ------ <- Tuple 3 123456789 <- Timeline TEMPORAL LEFT OUTER JOIN query ------------------------------ Query: At each time point, to which projects is an employee assigned, and when does an employee not have an assigned project? WITH emp AS (SELECT period u, * FROM emp), prj AS (SELECT period v, * FROM prj) SELECT empname, prjname, dept, period FROM ( emp ALIGN prj ON emp.dept = prj.dept WITH (emp.period, prj.period) ) emp_aligned LEFT OUTER JOIN ( prj ALIGN emp ON emp.dept = prj.dept WITH (prj.period, emp.period) ) prj_aligned USING(dept, period) WHERE period = u * v OR u IS NULL OR v IS NULL; Result ------ empname | prjname | dept | period ---------+---------+------+-------- Ann | PR2 | DB | [1,5) ---- Ann | PR1 | DB | [3,5) -- Bea | PR1 | DB | [3,7) ---- Bea | PR2 | DB | [3,5) -- Bea | | DB | [7,8) - Ron | | AI | [6,9) --- 123456789 <- Timeline STEP-BY-STEP EXPLANATION ================================================================================ In this chapter we describe first the rewrite and processing of the two ALIGN clauses, that are, ( emp ALIGN prj ON emp.dept = prj.dept WITH (emp.period, prj.period) ) emp_aligned ...and... ( prj ALIGN emp ON emp.dept = prj.dept WITH (prj.period, emp.period) ) prj_aligned. Secondly, we explain what the above query does in general, and why we need two ALIGNs, the WHERE-clause and CTEs (WITH clauses). ALIGN subquery processing ------------------------- After receiving the ALIGN query (see above) as input, we rewrite it into the following query: SELECT emp.*, GREATEST(LOWER(emp.period), LOWER(prj.period)) p1, LEAST(UPPER(emp.period), UPPER(prj.period)) p2 FROM (SELECT *, row_id() OVER () rn FROM emp) emp LEFT OUTER JOIN prj ON emp.dept = prj.dept AND emp.period && prj.period ORDER BY rn, p1, p2; Then, we rewrite the second ALIGN subquery analoguously. Intermediate results: These are the inputs of our ALIGN execution nodes ----------------------------------------------------------------------- (See Appendix [1] for details) For emp ALIGN prj... TUPID empname | dept | period | rn | p1 | p2 ---------+------+--------+----+----+---- A Ann | DB | [1,5) | 1 | 1 | 5 B Ann | DB | [1,5) | 1 | 3 | 5 C Bea | DB | [3,8) | 2 | 3 | 5 D Bea | DB | [3,8) | 2 | 3 | 7 E Ron | AI | [6,9) | 3 | 6 | 9 For prj ALIGN emp... TUPID prjname | dept | period | rn | p1 | p2 ---------+------+--------+----+----+---- a PR1 | DB | [3,7) | 1 | 3 | 7 b PR1 | DB | [3,7) | 1 | 3 | 5 c PR2 | DB | [1,5) | 2 | 3 | 5 d PR2 | DB | [1,5) | 2 | 1 | 5 e PR3 | HW | [2,6) | 3 | 2 | 6 We have four possibilities inside ALIGN during execution -------------------------------------------------------- (See Appendix [2] for details) 1) CURR == PREV a) S < CURR.P1 --> generate (CURR, [S, CURR.P1)) and set S = CURR.P1 b) otherwise: if OUT != CURR: generate (CURR, [CURR.P1, CURR.P2)) set S = MAX(S, CURR.P2) fetch next tuple if fetched tuple is NULL --> goto(2) forcing CURR != PREV to be true 2) CURR != PREV a) S < PREV.te --> generate (PREV, [S, PREV.te)) set S = CURR.ts and goto(1) forcing CURR == PREV to be true Executor steps for "emp ALIGN prj ON emp.dept = prj.dept..." ------------------------------------------------------------ 1) First call of ExecTemporalAdjustment: Fetch tuple A, which is now CURR. Set sweepline S = 1 (A.ts), and copy tuple A also into PREV. Goto(1) forcing CURR == PREV to be true. The last result tuple (OUT) is NULL. 2) (1): CURR == PREV, that is, tuple A is equal to itself. (1b): OUT == NULL, hence we generate a result tuple: (Ann, DB, [1,5)) ...where the interval is [CURR.P1, CURR.P2) and all other attributes are copied from the current tuple (omitting helper columns: RN, P1, and P2). This result is now the new OUT. We update the sweepline: S = 5 <-- MAX(1, 5). Fetch tuple B; CURR = B, PREV = A 3) (1): A == B. (1b): OUT != CURR (A.ts != B.p1), hence we generate a result tuple: (Ann, DB, [3,5)) Fetch tuple C; CURR = C, PREV = B 4) (2): B != C. We update the sweepline: S = 3 (CURR.ts) We set PREV = C. Goto(1) forcing CURR == PREV to be true. 5) (1): C == C. (1b): OUT != CURR (OUT.rn != CURR.rn), hence we generate a result tuple: (Bea, DB, [3,5)) We update the sweepline: S = 5 (CURR.p2) Fetch tuple D; CURR = D, PREV = C 6) (1): C == D. (1b): OUT != CURR (OUT.te != CURR.p2), hence we generate a result tuple: (Bea, DB, [3,7)) We update the sweepline: S = 7 (CURR.p2) Fetch tuple E; CURR = E, PREV = D 7) (2): D != E. (2a): S < D.te (7 < 8), hence we generate a result tuple: (Bea, DB, [3,7)) We update the sweepline: S = 6 (CURR.ts) We set PREV = E. Goto(1) forcing CURR == PREV to be true. 8) (1): E == E. (1b): OUT != CURR (OUT.ts != CURR.p1), hence we generate a result tuple: (Ron, AI, [6,9)) We update the sweepline: S = 9 (CURR.p2) No new tuple can be fetched, nor is any new result tuple produced, hence we terminate our executor. The result of "emp ALIGN prj..." is as follows: empname | dept | period ---------+------+-------- Ann | DB | [1,5) Ann | DB | [3,5) Bea | DB | [3,5) Bea | DB | [3,7) Bea | DB | [7,8) Ron | AI | [6,9) Whereas, the result of "prj ALIGN emp..." is the following: prjname | dept | period ---------+------+-------- PR1 | DB | [3,5) PR1 | DB | [3,7) PR2 | DB | [1,5) PR2 | DB | [3,5) PR3 | HW | [2,6) The initial goal was to find at each time point, for which projects inside a department is an employee responsible. These can now be easily done from these aligned inputs with standard (reused) PostgreSQL operators. In our example a regular LEFT OUTER JOIN, which gives the output below: empname | prjname | dept | period ---------+---------+------+-------- Ann | PR2 | DB | [1,5) Ann | PR1 | DB | [3,5) Ann | PR2 | DB | [3,5) Bea | PR1 | DB | [3,7) Bea | PR1 | DB | [3,5) Bea | PR2 | DB | [3,5) Bea | | DB | [7,8) Ron | | AI | [6,9) As you can see, some result tuples are already covered by others. For instance, (Ann, PR2, DB, [3,5)) is already covered by (Ann, PR2, DB, [1,5)). Therefore, we need an "absorbing" WHERE-clause, and CTEs (WITH-clauses) to compare the original periods with the new result tuples. After nesting the JOIN within these clauses the final result is: empname | prjname | dept | period ---------+---------+------+-------- Ann | PR2 | DB | [1,5) Ann | PR1 | DB | [3,5) Bea | PR1 | DB | [3,7) Bea | PR2 | DB | [3,5) Bea | | DB | [7,8) Ron | | AI | [6,9) Best regards, Anton, Michael, Johann, Peter -------------------------------------------------------------------------------- Appendix: [1] After the intermediate result we get an input, which is splitted into so-called temporal groups. Each of these groups satisfy the USING-condition (a.dept = b.dept), has overlapping periods, and a row number as ID (rn). The input is ordered by temporal group ID, and the start and ending timepoints, i.e., P1 and P2. Please note, that each tuple in a certain rn ID is always equal to any other tuple in that rn ID, except for P1 and P2. [2] Additinal information for ALIGN: - sweepline: to sweep from left-to-right on the time line (S) - slots to the current and previous tuple (CURR, PREV) - Two tuples A and B are equal, i.e. A == B, if A.rn == B.rn - ts means LOWER(time), and te means UPPER(time) - OUT is the last produced output tuple (last result tuple) - OUT != CURR, if OUT.rn != CURR.rn or OUT.ts != CURR.p1 or OUT.te != CURR.p2 or OUT == NULL