> The thing here is that you are effectively causing Postgres to run a
> sub-select for each row of the "result" table, each time generating
> either an empty list or a list with one or more identical URLs. This
> is effectively forcing a nested loop. In a way, you have two
> constraints where you only need one.
> You can safely take out the constraint in the subquery, so it is
> like this:
> SELECT COUNT(*) FROM result WHERE url IN (SELECT shorturl FROM item);
> This will generate equivalent results, because those rows that
> didn't match the constraint wouldn't have affected the IN anyway.
> However, it will alter the performance, because the subquery will
> contain more results, but it will only be run once, rather than
> multiple times. This is effectively forcing a hash join (kind of).
> Whereas if you rewrite the query as I demonstrated earlier, then you
> allow Postgres to make its own choice about which join algorithm
> will work best.
Thank you! I learned a lot today :-)
I thought the subquery will be run on every row thus I tried to make
it as fast as possible by using a where clause. I didn't try your
first query on the hole table so it could be faster than mine approach.
In response to
pgsql-performance by date
|Next:||From: Franck Routier||Date: 2008-06-30 16:57:58|
|Subject: Does max size of varchar influence index size|
|Previous:||From: Alvaro Herrera||Date: 2008-06-30 16:23:06|
|Subject: Re: VACUUM ANALYZE blocking both reads and writes to atable|