Re: Question on a select

From: "Vincent Hikida" <vhikida(at)inreach(dot)com>
To: "Madison Kelly" <linux(at)alteeve(dot)com>, "PgSQL General List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question on a select
Date: 2005-01-02 09:52:05
Message-ID: 00b801c4f0b0$b8514260$6501a8c0@HOMEOFFICE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> They are all 'not null' and I am trying to do exactly the kind of task
> you described. I tried the first example on my DB and got a syntax error:
>
> tle-bu=> SELECT a.file_name, a.file_parent_dir, a.file_type FROM
> file_info_1 a WHERE NOT EXIST (SELECT NULL FROM file_set_1 b WHERE
> b.fs_name=a.file_name, b.fs_parent_dir=a.file_parent_dir,
> b.fs_type=a.file_type);
> ERROR: syntax error at or near "SELECT" at character 88
>
I've quickly read the thread and I don't think you got an answer as to why
you are getting a syntax error here. Your query shows something line

WHERE b.fs_name=a.file_name, b.fs_parent_dir=a.file_parent_dir

you need to put an AND instead of a comma:

WHERE b.fs_name=a.file_name AND b.fs_parent_dir=a.file_parent_dir

As for which of the queries is best I don't know. My background is as an
Oracle developer. I think that Bruno already suggested testing the three
queries. There is a trace utility which shows some of what happens under the
covers of a query. I've used it extensively in Oracle but have never used it
in Postgresql.

If I understand what you said, the NOT IN was significantly slower. That has
been my experience in Oracle long time ago so I've tended to shy away from
that syntax. I'm sure optimizers are much better now then when I
experimented with NOT IN but my coworker who tried it in Oracle was getting
a slower response than with a subselect about a year ago. Theoretically if 3
queries are logically equivalent as the three queries you've been given, an
optimizer should find the same best query plan to execute it. I don't think
that optimizers are that smart yet.

The outer join is probably doing either a sort merge or a hash join. In your
application this should be the best option. (A sort merge sorts both tables
first or at least the key columns and then merges the tables together.)

Bruno said that the subselect would be slower. It may be that he thinks it
will do a nested loop. That is that it will read each row in table A and try
to find that concatenated key in table B's index. I don't think that a
nested loop would be very good in your particular application.

As for the indexes you set up, I think they are correct indexes.

Vincent

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Vincent Hikida 2005-01-02 09:59:34 Re: Question on a select
Previous Message Madison Kelly 2005-01-02 07:55:15 Re: Question on a select