Re: directory tree query with big planner variation

From: Axel Rau <Axel(dot)Rau(at)Chaos1(dot)DE>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: directory tree query with big planner variation
Date: 2006-07-31 15:06:00
Message-ID: A57707A5-9987-4991-B775-6892B3DDD7DC@Chaos1.DE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Am 31.07.2006 um 15:30 schrieb Michael Stone:

> If I understand the intend of this SQL,
Let me show the tables first:
Table "bacula.path" ( 65031 rows)
Column | Type | Modifiers
--------+---------
+-------------------------------------------------------
pathid | integer | not null default nextval('path_pathid_seq'::regclass)
path | text | not null ( complete pathnames of all
directories )
Indexes:
"path_pkey" PRIMARY KEY, btree (pathid)
"path_name_idx" btree (path)

Table "bacula.file" (3021903 rows)
Column | Type | Modifiers
------------+---------
+-------------------------------------------------------
fileid | integer | not null default nextval
('file_fileid_seq'::regclass)
fileindex | integer | not null default 0
jobid | integer | not null
pathid | integer | not null (FK)
filenameid | integer | not null (FK)
markid | integer | not null default 0
lstat | text | not null
md5 | text | not null
Indexes:
"file_pkey" PRIMARY KEY, btree (fileid)
"file_fp_idx" btree (filenameid, pathid)
"file_jobid_idx" btree (jobid)
"file_path_idx" btree (pathid)

Table "bacula.filename" ( 160559 rows)
Column | Type | Modifiers
------------+---------
+---------------------------------------------------------------
filenameid | integer | not null default nextval
('filename_filenameid_seq'::regclass)
name | text | not null
Indexes:
"filename_pkey" PRIMARY KEY, btree (filenameid)
"filename_name_idx" btree (name)

And now the query;

Task: Return the names of subdirectories and files immediately below
a given path. For each none-empty subdirectory return children=true.
The 1st part of the union selects all subdirecories (per regex) and
the flatfiles contained in them plus one entry for the subdirectory
itself (left outer joins). More than one joined filename means: "The
subdirectory has children".
The 2nd part of the union returns all flatfiles, contained in the
given path.
The surrounding SELECT removes the given path and the trailing "/"
keeping only the subdirectory names from the pathnames, so they can
be merged with the flatfile names.

> you're pulling all the entries
> in a directory in two parts. The first
(second)
> part (files) is fairly straightforward. The second
(first)
> part (directories) consists of pulling any file whose parent is a
> subdirectory of the directory you're looking for (this is *all*
> children of the directory, since you have to retrieve every element
> that begins with the directory, then discard those that have an
> additional / in their name), counting how many of these there are
> for each subdirectory, and discarding those results except for a
> binary (yes there are children or no there aren't). This is a lot
> of useless work to go through, and is going to be slow if you've
> got a lot of stuff in a subdirectory.
I agree, but did not yet find another way.
> An alternative approach would be, for each directory, to store all
> its children (files and subdirectories) along with a flag
> indicating which it is. This would allow you to create the
> collapsed tree view without walking all the children of a
> subdirectory.
Perhaps in a temporary table?
>
> Assuming you can't make changes to the schema, what about the query?
Can be changed.
> You've got this:
Please reconsider your proposals with the above

> It's hard to say without knowing what's actually *in* the tables,
> but the existing query definately doesn't scale well for what I
> think it's trying to do.
>
> Mike Stone
Axel
Axel Rau, ☀Frankfurt , Germany +49-69-951418-0

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Stone 2006-07-31 15:21:44 Re: directory tree query with big planner variation
Previous Message Tom Lane 2006-07-31 14:28:46 Re: sub select performance due to seq scans