Re: Any way to favor index scans, but not bitmap index scans?

From: "Francisco Reyes" <lists(at)stringsutils(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Any way to favor index scans, but not bitmap index scans?
Date: 2008-07-23 20:12:10
Message-ID: 0b8afb1dff45c26725b204a62de0a4be@stringsutils.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 3:37 pm 07/23/08 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Francisco Reyes" <lists(at)stringsutils(dot)com> writes:
> > SET ENABLE_SEQSCAN TO OFF;
> > SET ENABLE_BITMAPSCAN TO OFF;
> > Aggregate (cost=25665216.10..25665216.11 rows=1 width=12) (actual
> > time=3088.894..3088.896 rows=1 loops=1)
> > -> Nested Loop (cost=0.00..25662307.70 rows=387785 width=12)
> > (actual time=0.264..2624.680 rows=194734 loops=1)
> > -> Index Scan using join_ids_join_id on join_ids
> > (cost=0.00..2867051.21 rows=5020 width=4) (actual
> > time=0.237..1236.019 rows=4437 loops=1)
> > Filter: (customer_id = ANY ('{1014,2485,4636,4635,12
> 55,547,374,580}'::integer[]))
> > -> Index Scan using historical_join_id_date on historical
> > (cost=0.00..4522.43 rows=1477 width=16) (actual
> > time=0.010..0.153 rows=44 loops=4437)
> > Index Cond: ((historical.join_id =
> > join_ids.join_id) AND (historical.date > '2007-04-01'::date)
> > AND (historical.date < '2008-05-01'::date))
> > Filter: (trans.f5 > 0::numeric)
> > Total runtime: 3091.227 ms --> 3 seconds
>
> You might be more likely to get a sane plan if you had an index on
> join_ids.customer_id.

There is an index in join_ids:
joinids_customerids_joinid" btree (customer_id, joinid) WITH (fillfactor=98)

Also, that plan is only 3 seconds. That is as good as that is going to get.
Or where you refering that the other plans would have been better?

> The first indexscan above is really a
> completely silly choice, and would never have been used if you
> weren't holding a gun to the planner's head.

I have much to learn about how to properly read an explain analyze, but as
silly as that plan may look it outperforms the other plans by orders of
magnitude. 3 seconds vs 12 minutes is a very big difference. It was so fast
that I even compared the results (which happens to be a single row) to make
sure I was getting the correct value.

>The index isn't contributing any selectivity at all.

Which index scan? Are analyze read bottom up right?
If it is this one you are refering to:

-> Index Scan using historical_join_id_date on historical
> > (cost=0.00..4522.43 rows=1477 width=16) (actual
> > time=0.010..0.153 rows=44 loops=4437)
> > Index Cond: ((historical.join_id =
> > join_ids.join_id) AND (historical.date > '2007-04-01'::date)
> > AND (historical.date < '2008-05-01'::date))
> > Filter: (trans.f5 > 0::numeric)

I believe that is the reason performance is good with that plan.
The number of rows that need to be returned from historical is less than 1%.

> The other part of the problem is the factor-of-thirty overestimate of
> the number of rows that the inner indexscan will produce (which means
> also a factor-of-thirty overestimate of its cost). Perhaps higher
> statistics targets for these two relations would give you a better
> estimate there.

Is it possible to go over
default_statistics_target = 1000?

> since there's no additional join condition. What PG version are you
> running exactly?

8.3.3

I have only been at this job for 3 months and I can say that neither the
data, nor the previous design I am trying to replace play nice with
postgresql. I can't get into specifics, but I can say that our "historical"
tables have about 60% data that is not used in most queries. I think that
is partly what throws off the planner so much. My first clue was when I saw
the planner trying to do sequential scans to retrieve less than 1% of rows.
It didn't make sense.

I tried several schemes with partitioning and that was even worse.

I am going to convert the tables structure names to the mapping names I
used in these thread. Perhaps that may be informative.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-07-23 20:23:09 Re: Equality search on timestamp value returns no rows
Previous Message dpage 2008-07-23 19:58:02 Re: mac install question