Documentation example for RECURISVE WITH isn't what I would expect

From: Alex Sherwin <alex(dot)sherwin(at)gmail(dot)com>
To: pgsql-docs(at)postgresql(dot)org
Subject: Documentation example for RECURISVE WITH isn't what I would expect
Date: 2011-11-25 14:43:27
Message-ID: CAMqGX520K=3onSR45KQb=7iwk9V2Xt43V267K+Lx+XZcA=K0aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

The 9.1 docs for RECURSIVE WITH:
http://www.postgresql.org/docs/9.1/static/queries-with.html eventually
builds up to this example query:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[g.id],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || g.id,
g.id = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

From reading the docs and this query, it would be my understanding that the
recursive portion (after UNION ALL) can refer to itself, which is what it
appears this query does with the definition of cycle as "g.id = ANY(path)"
then later referring to it in the WHERE clause "ANT NOT cycle", meaning
that I would expect this to exclude duplicate traversals.

But, in my test this is not the case, I have to modify the query (complete
working example below) as follows to make it work as the docs (I believe)
intended:

drop table graph;
create table graph (
id varchar not null primary key,
link varchar null,
data varchar null);

insert into graph (id, link, data) values ('a', null, 'a');
insert into graph (id, link, data) values ('ab', 'a', 'ab');
insert into graph (id, link, data) values ('ac', 'a', 'ac');
insert into graph (id, link, data) values ('aca', 'ac', 'aca');
insert into graph (id, link, data) values ('acb', 'ac', 'acb');
insert into graph (id, link, data) values ('acba', 'acb', 'acba');
insert into graph (id, link, data) values ('acbb', 'acb', 'acbb');
insert into graph (id, link, data) values ('acc', 'ac', 'acc');
insert into graph (id, link, data) values ('ad', 'a', 'ad');

with recursive search_graph(id, link, data, depth, path, cycle) as
(
select
g.id, g.link, g.data, 1, array[g.id], false
from graph g
union all
select
g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id=any(path || g.id)
from graph g, search_graph sg
where g.id=sg.link and not cycle and not g.id=any(path || g.id)
)
select * from search_graph

Without the change "g.id=any(path || g.id)", cycle is never set to true.
If this were the only change, the first level ('a') would still be
re-visited, but stop any further duplicate visits from there. In order to
prevent 'a' from being re-visited altogether, the WHERE must be modified as
well with "and not g.id=any(path || g.id)", which essentially negates the
need for cycle to begin with and can be re-factored as:

with recursive search_graph(id, link, data, depth, path) as
(
select
g.id, g.link, g.data, 1, array[g.id]
from graph g
union all
select
g.id, g.link, g.data, sg.depth + 1, path || g.id
from graph g, search_graph sg
where g.id=sg.link and not g.id=any(path || g.id)
)
select * from search_graph

Unless I'm missing something here.. but I don't think the example in the
docs works correctly.

To apply the docs example to my data set, would look like:

with recursive search_graph(id, link, data, depth, path, cycle) as
(
select
g.id, g.link, g.data, 1, array[g.id], false
from graph g
union all
select
g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id=any(path)
from graph g, search_graph sg
where g.id=sg.link and not cycle
)
select * from search_graph

Which re-visits many rows

--
Alexander Sherwin

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Tom Lane 2011-11-26 17:10:12 Re: Documentation example for RECURISVE WITH isn't what I would expect
Previous Message Daniele Varrazzo 2011-11-25 13:54:27 Wrong advisory locks docs in pg_locks