RFP: Recursive query in 8.4

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: RFP: Recursive query in 8.4
Date: 2008-02-19 08:36:00
Message-ID: 20080219.173600.116605002.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

As I promised before we would like to propose implementing the
recursive query as defined in the standard for PostgreSQL 8.4.

The work is supported by Sumitomo Electric Information Systems Co.,
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).

1. Overview

We propose to implement the recursive query (WITH RECURSIVE clause)
defined in SQL:1999 and later. With the recursive query, one can
easily inquire the data expressed as tree and graph structures. The
actual syntax we prefer is the one defined in SQL:2008 (it's not
published yet, but I have a closest draft).

We do not propose the WITH clause without RECURSIVE key word here
since someone else has already made a proposal for this.
(http://archives.postgresql.org/pgsql-patches/2008-01/msg00105.php)

2. Example

For those who are not familiar with the recursive query, I include an
example:

CREATE TABLE department (
id INT PRIMARY KEY,
parent_department INT REFERENCES department,
name TEXT
);

INSERT INTO department VALUES (0, NULL, 'ROOT');
INSERT INTO department VALUES (1, 0, 'A');
INSERT INTO department VALUES (2, 1, 'B');
INSERT INTO department VALUES (3, 2, 'C');
INSERT INTO department VALUES (4, 2, 'D');
INSERT INTO department VALUES (5, 0, 'E');
INSERT INTO department VALUES (6, 4, 'F');
INSERT INTO department VALUES (7, 4, 'G');

This will represent a tree structure of an organization:

ROOT ---> A ---> B ---> C ---> F
| |
| +----> D
|
+-----> E ---> G

If you want to extract all departments "under" A, you could use a
recursive query:

WITH RECURSIVE subdepartment AS
(
--
SELECT * FROM department WHERE id = 'A'

UNION ALL

-- recursive term referring to "subdepartment"
SELECT d.* FROM department AS d, subdepartment AS sd
WHERE d.id = sd.parent_department
)
SELECT * FROM subdepartment;

This will return A, B, C, D and F.

2. The syntax

As described above, we refers to the SQL:2008's WITH RECURSIVE clause syntax.

WITH RECURSIVE clause ::=

WITH RECURSIVE <query name> AS ( <table subquery> )
[ SEARCH clause | CYCLE clause ]
<SELECT body>

In the example above, <query name> is "subdepartment" and <table
subquery> is SELECTs in pareses.

<table subquery> must have one or more "anchor" expressions. This is
required by the standard.

The anchor expressions are consisted with "none recursive term"
(SELECT * FROM department WHERE id = 'A') + UNION ALL + "recursive
term" (SELECT d.* FROM department AS d, subdepartment AS sd WHERE d.id
= sd.parent_department). <SELECT body> is "SELECT * FROM subdepartment".

Note that the standard allows to use an UNION without ALL. However
this proposal only allow UNION ALL due to an implementation reason.

Other limitations required by the standard include:

- aggregate functions are not allowed in the recursive term

- GROUP BY is not allowed in the recursive term

- outer joins are not allowed in the recursive term

3. Processing a recursive query

If a WITH clause includes a recursive referencing cycle, we call the
set of <with list elements> as "partition". In the example above,
there is a partition in which subdepartment referees to itself. We
limit number of list elements in a partition up to 1, which means it
should be a self reference.

While processing a recursive query, we start with a partition which
does not depend on any other partitions.

There is a working table WT and an intermediate table RT to evaluate a
partition. We implement WT and RT using tuplestore.

The algorithm is shown below.

[using the width first search]

1. evaluate non recursive term or partition depending on other
partitions and assign the result to RT

2. execute recursive terms

2.1 WT := RT
2.2 while WT is not empty repeat 2.3 to 2.6. if WT is empty returns RT
2.3 replace the name of recursive term with WT
2.4 evaluate the recursive term and store into WT
2.5 append WT to RT
2.6 go back to 2.2

Pseudo code shown below.

1. RT := none recursive query result
2. for i = 1..N (N = number of partitions)
2.1 WT := RT
2.2 while !empty(WT); do
2.3 subdepartment := WT
2.4 WT := result of recursive term
2.5 RT := WT UNION ALL RT
2.6 done

Execution trace shown below.

WITH RECURSIVE subdepartment AS
(
-- non recursive term
SELECT * FROM department WHERE id = 'A'

UNION ALL

-- recursive term referring to "subdepartment"
SELECT d.* FROM department AS d, subdepartment AS sd
WHERE d.id = sd.parent_department
)
SELECT * FROM subdepartment;

1)
RT = {'A'} WT = {'A'}

2)
SELECT d.* FROM department AS d, WT({'A'}) AS sd
WHERE d.id = sd.parent_id

WT = {'B'}

RT = RT({'A'}) UNION ALL(*) WT({'B'})
=> RT = {'A', 'B'}

3)
SELECT d.* FROM department AS d, WT({'B', 'C'}) AS sd
WHERE d.id = sd.parent_id

=> WT = {'C', 'D'}

RT = RT({'A', 'B'}) UNION ALL WT({'C', 'D'})
=> RT = {'A', 'B', 'C', 'D'}

4)
SELECT d.* FROM department AS d, WT({'B', 'C'}) AS sd
WHERE d.id = sd.parent_id

=> WT = {'F'}

RT = RT({'A', 'B', 'C', 'D'}) UNION ALL WT({'F'})
=> RT = {'A', 'B', 'C', 'D', 'F'}

5)
SELECT d.* FROM department AS d, WT({'B', 'C'}) AS sd
WHERE d.id = sd.parent_id

=> WT = {}

RT = RT({'A', 'B', 'C', 'D', 'F'}) UNION ALL WT({}) <--(1)
=> RT = {'A', 'B', 'C', 'D', 'F'}

6)
Since WT is empty, the execution stops. RT = {'A', 'B', 'C', 'D', F'} (the result)

(1) Actually we do not execute UNION ALL. We use tuplestore_puttuple()
to add results to RT.

4. About GUC parameters

We do not add new GUC parameters.

5. Limitation with PostgreSQL

1) we do not implement SEARCH clause and CYCLE clause. This is because
we need array of rows to implement them. Note that there's no
support for array of rows in PostgreSQL.

2) we only allow UNION ALL while appending none recursive term and
recursive terms. This is because it's difficult to remove
duplications using tuplestore. Note that Firebird and MS SQL support
only UNION as well.

[1] http://www.firebirdsql.org/rlsnotesh/rlsnotes210.html#rnfb210-cte
[2] http://msdn2.microsoft.com/en-us/library/ms186243.aspx

3) no support for mutually recursive queries

In the parser we throw an error if there's a mutually recursive query.

rec1 -> rec2 -> rec1 -> ...

Note that Firebird and MS SQL do not support mutually recursive
queries either.

6. Problems

1) while processing recursive queries, we repeat JOIN operations many
times. JOIN methods can be nested loop, merge join, or hash join.
Depending on the number of rows and etc., the optimal join method
may vary. However we have no way to change the join method
dynamically according to increasing the number of rows.

2) it's not clear for the planner how to estimate the cost of
recursive queries.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-02-19 08:56:04 Re: ANALYZE to be ignored by VACUUM
Previous Message ITAGAKI Takahiro 2008-02-19 07:31:20 Re: ANALYZE to be ignored by VACUUM