Re: Question on a select

From: Madison Kelly <linux(at)alteeve(dot)com>
To: PgSQL General List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question on a select
Date: 2005-01-02 07:55:15
Message-ID: 41D7A8E3.1050507@alteeve.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Bruno Wolff III wrote:
> There should be parenthesis around the list to test.
> WHERE a_name, a_type, a_dir NOT IN (
> should be
> WHERE (a_name, a_type, a_dir) NOT IN (

That did it (I think)!

>
> I believe that the NOT IN query should run comparably to the LEFT JOIN
> example supplied by the other person (at least in recent versions of
> Postgres). I would expect this to run faster than using NOT EXISTS.
> You probably want to try all 3. The semantics of the three ways of doing
> this are not all equivalent if there are NULLs in the data being used
> to eliminate rows. As you indicated you don't have NULLs this shouldn't
> be a problem.
>
> Another way to write this is using set different (EXCEPT or EXCEPT ALL)
> using the key fields and then joining back to table a to pick up the
> other fields. However this will almost certianly be slower than the
> other methods.

Something odd, now that I have the other method working (I think)...

tle-bu=> SELECT a.file_name, a.file_parent_dir, a.file_type FROM
file_info_1 a LEFT JOIN file_set_1 b ON a.file_name=b.fs_name AND
a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type WHERE
b.fs_name IS NULL;

returns the results in roughly 1 or 2 seconds on a test data set of
15,000 entries. I have an index on both 'file_info_1' covering
'fs_name', 'fs_parent_dir' and 'fs_type' and on 'file_set_1' covering
'file_name', 'file_parent_dir' and 'file_type'. When I try the seconds
method though:

tle-bu=> SELECT file_name, file_parent_dir, file_type FROM file_info_1
WHERE (file_name, file_parent_dir, file_type) NOT IN (SELECT fs_name,
fs_parent_dir, fs_type FROM file_set_1);

It took so long to process that after roughly three minutes I stopped
the query for fear of overheating my laptop (which happend a while back
forcing a thermal shut down).

The indexes are:

CREATE INDEX file_info_#_display_idx ON file_info_# (file_type,
file_parent_dir, file_name);
CREATE INDEX file_set_#_sync_idx ON file_set_# (fs_name, fs_parent_dir,
fs_type)

Are these not effective for the second query? If not, what should I
change or add? If so, would you have any insight into why there is such
an incredible difference in performance?

Thanks very much again!!

Madison

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vincent Hikida 2005-01-02 09:52:05 Re: Question on a select
Previous Message Bruno Wolff III 2005-01-02 07:52:49 Re: Question on a select