Re: Hash Join not using hashed index?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ang Chin Han <angch(at)pollcities(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Hash Join not using hashed index?
Date: 2000-06-29 03:15:12
Message-ID: 18724.962248512@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Ang Chin Han <angch(at)pollcities(dot)com> writes:
> No evidence, but I was hoping that having a prehashed country_id
> would speed things up a bit, since the seq scan on country could
> be redundant, requring only a seq scan on city and a index (hash)
> lookup on country.

> Or maybe this is a related question (just curious):

> pintoo=# explain select country_id from country order by country_id;
> Sort (cost=15.63..15.63 rows=253 width=2)
> -> Seq Scan on country (cost=0.00..5.53 rows=253 width=2)

> If there is already in b-tree index on country_id, why bother
> re-sorting it, when it could be output'd by traversing the tree?

In fact the planner does consider both of those alternatives (at
least in 7.0), but it estimates that they are *slower* than doing
it in what appears to be the hard way. Accessing an index is not
particularly cheap.

You can investigate what the planner thinks of these different
alternatives using the rather crude tool of disabling particular
scan types. For example:

regression=# create table country (country_id int2);
CREATE
regression=# create index country_id_index on country (country_id);
CREATE
regression=# explain select country_id from country order by country_id;
NOTICE: QUERY PLAN:

Index Scan using country_id_index on country (cost=0.00..60.00 rows=1000 width=2)

(Of course, since I haven't vacuumed this newly-created table, these
cost numbers are bogus; they are based on default assumptions about
table size, which are 1000 rows in 10 disk pages IIRC. With those
particular numbers an indexscan does look the cheapest. Anyway,
we're coming to the point now:)

regression=# set enable_indexscan = off;
SET VARIABLE
regression=# explain select country_id from country order by country_id;
NOTICE: QUERY PLAN:

Sort (cost=69.83..69.83 rows=1000 width=2)
-> Seq Scan on country (cost=0.00..20.00 rows=1000 width=2)

EXPLAIN

The planner chooses the indexscan here because 60.00 < 69.83, so it
thinks the indexscan is faster. But different values for the table
size and number of rows can easily drive it to make the other choice.

You could experiment with changing enable_indexscan and enable_seqscan
to see what the planner's estimates are for your tables and then
measure what the actual runtime is like for each plan type.

From my point of view, it's only a bug if the ratio of the cost
estimates is radically out of line with the ratio of the actual
runtimes.

regards, tom lane

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Thomas Swan 2000-06-29 03:54:08 Re: Finding the names of inheriting classes
Previous Message Ang Chin Han 2000-06-29 02:41:01 Re: Hash Join not using hashed index?