Quick Links

Re: directory tree query with big planner variation

From:	Axel Rau <Axel(dot)Rau(at)Chaos1(dot)DE>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: directory tree query with big planner variation
Date:	2006-07-31 15:06:00
Message-ID:	A57707A5-9987-4991-B775-6892B3DDD7DC@Chaos1.DE
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Am 31.07.2006 um 15:30 schrieb Michael Stone:

> If I understand the intend of this SQL,
Let me show the tables first:
Table "bacula.path" ( 65031 rows)
Column | Type | Modifiers
--------+---------
+-------------------------------------------------------
pathid | integer | not null default nextval('path_pathid_seq'::regclass)
path | text | not null ( complete pathnames of all
directories )
Indexes:
"path_pkey" PRIMARY KEY, btree (pathid)
"path_name_idx" btree (path)

Table "bacula.filename" ( 160559 rows)
Column | Type | Modifiers
------------+---------
+---------------------------------------------------------------
filenameid | integer | not null default nextval
('filename_filenameid_seq'::regclass)
name | text | not null
Indexes:
"filename_pkey" PRIMARY KEY, btree (filenameid)
"filename_name_idx" btree (name)

And now the query;

Task: Return the names of subdirectories and files immediately below
a given path. For each none-empty subdirectory return children=true.
The 1st part of the union selects all subdirecories (per regex) and
the flatfiles contained in them plus one entry for the subdirectory
itself (left outer joins). More than one joined filename means: "The
subdirectory has children".
The 2nd part of the union returns all flatfiles, contained in the
given path.
The surrounding SELECT removes the given path and the trailing "/"
keeping only the subdirectory names from the pathnames, so they can
be merged with the flatfile names.

> you're pulling all the entries
> in a directory in two parts. The first
(second)
> part (files) is fairly straightforward. The second
(first)
> part (directories) consists of pulling any file whose parent is a
> subdirectory of the directory you're looking for (this is *all*
> children of the directory, since you have to retrieve every element
> that begins with the directory, then discard those that have an
> additional / in their name), counting how many of these there are
> for each subdirectory, and discarding those results except for a
> binary (yes there are children or no there aren't). This is a lot
> of useless work to go through, and is going to be slow if you've
> got a lot of stuff in a subdirectory.
I agree, but did not yet find another way.
> An alternative approach would be, for each directory, to store all
> its children (files and subdirectories) along with a flag
> indicating which it is. This would allow you to create the
> collapsed tree view without walking all the children of a
> subdirectory.
Perhaps in a temporary table?
>
> Assuming you can't make changes to the schema, what about the query?
Can be changed.
> You've got this:
Please reconsider your proposals with the above

> It's hard to say without knowing what's actually *in* the tables,
> but the existing query definately doesn't scale well for what I
> think it's trying to do.
>
> Mike Stone
Axel
Axel Rau, ☀Frankfurt , Germany +49-69-951418-0

In response to

Re: directory tree query with big planner variation at 2006-07-31 13:30:37 from Michael Stone

Responses

Re: directory tree query with big planner variation at 2006-07-31 15:21:44 from Michael Stone

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Michael Stone	2006-07-31 15:21:44	Re: directory tree query with big planner variation
Previous Message	Tom Lane	2006-07-31 14:28:46	Re: sub select performance due to seq scans