Re: Tell postgres which index to use?

From: Greg Stark <gsstark(at)mit(dot)edu>
To: John Arbash Meinel <john(at)arbash-meinel(dot)com>
Cc: Silke Trissl <trissl(at)informatik(dot)hu-berlin(dot)de>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Tell postgres which index to use?
Date: 2005-02-09 20:50:06
Message-ID: 873bw5e9vl.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

John Arbash Meinel <john(at)arbash-meinel(dot)com> writes:

> > -> Hash (cost=1418.68..1418.68 rows=3226 width=4) (actual
> > time=77.062..77.062 rows=0 loops=1)
>
> This seems to be at least one of the problems. The planner thinks there
> are going to be 3000+ rows, but in reality there are 0.

No, that's a red herring. Hash nodes always report 0 rows.

> > Nested Loop (cost=0.00..23849.81 rows=7533 width=8) (actual time=0.341..198.162 rows=5798 loops=1)
> > -> Seq Scan on pdb_entry (cost=0.00..1418.68 rows=3226 width=4) (actual time=0.145..78.177 rows=3329 loops=1)
> > Filter: ((resolution > 0::double precision) AND (resolution < 1.7::double precision))
> > -> Index Scan using chain_pdb_id_ind on "chain" (cost=0.00..6.87 rows=6 width=8) (actual time=0.021..0.027 rows=2 loops=3329)
> > Index Cond: ("outer".id = "chain".pdb_id)

The actual number of records is pretty close to the estimated number. And the
difference seems to come primarily from selectivity of the join where it
thinks an average of 6 rows will match every row whereas in fact an average of
about 2 rows matches.

So it thinks it's going to read about 18,000 records out of 67,000 or about
25%. In that case the sequential scan is almost certainly better. In fact it's
going to read about 6,000 or just under 10%, in which case the sequential scan
is probably still better but it's not so clear.

I suspect the only reason you're seeing such a big difference when I would
expect it to be about even is because nearly all the data is cached. In that
case the non-sequential access pattern of the nested loop has little effect.

You might get away with lowering random_page_cost but since it thinks it's
going to read 25% of the table I suspect you'll have to get very close to 1
before it switches over, if it does even then. Be careful about tuning
settings like this based on a single query, especially to unrealistically low
values.

You might also want to try raising the statistics target on pdb_entry. See if
that makes the estimate go down from 6 to closer to 2.

--
greg

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2005-02-09 20:59:50 Re: Performance Tuning
Previous Message Paul Johnson 2005-02-09 20:49:43 Re: Solaris 9 tuning