Re: Why is this doing a seq scan?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ingram, Bryan" <BIngram(at)sixtyfootspider(dot)com>
Cc: "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Why is this doing a seq scan?
Date: 2000-11-17 22:07:57
Message-ID: 14217.974498877@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I said:
> Ah, there's your problem --- the planner is not very smart about
> optimizing cross-datatype comparisons. Make these columns both text,
> or both varchar, and I'll bet you get a more intelligent plan.

After a little further thought, I realize that the planner may be
handicapped by not realizing it can do a merge or hash join across
datatypes, but even without that problem, this is not going to be
a fast query. What you've got is

select ... from atms x, zips y where x.zip = y.zip
order by 1 limit 3;

and there just isn't any way to process this without forming the
full join product --- ie, the thing will sit there and form a join
tuple for *every* valid combination of ATM and ZIP in your database,
and then compute the distance to the target point for every one of
those ATMs, and then sort that result, and finally give you only
the top three rows. A smarter kind of join isn't going to help
all that much; to make this fast, you need to be filtering
using the really selective condition (distance to the target point)
*before* you do the join.

If you are sufficiently interested in the speed of this query to want to
maintain a specialized index for it, I'd suggest looking at an r-tree
index on the location data, and then using a WHERE condition on the
r-tree index to prefilter the rows before you join. r-trees only work
on boxes and polygons AFAICT --- what would work nicely is to store a
"box" of very small dimensions surrounding the location of each ATM,
index that column, and then use a WHERE test for overlap between that
box column and a box surrounding the target point out to as far as you
think is likely to be interesting. This gives you back a fairly small
number of candidate ATMs for which you compute the exact distance to
the target, sort, and limit. Not sure that you need to join to zips
at all if you do it this way.

regards, tom lane

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Diehl, Jeffrey 2000-11-17 23:01:20 RE: UNION in views
Previous Message clayton cottingham 2000-11-17 21:45:37 Re: is there a mysql to postgresql sql converter?