Re: How to get the optimizer to use an index with multiple

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to get the optimizer to use an index with multiple
Date: 2003-04-28 22:54:39
Message-ID: 20030428175439.L66185@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Apr 28, 2003 at 01:52:57PM -0600, scott.marlowe wrote:
> On Mon, 28 Apr 2003, Jim C. Nasby wrote:
>
> > select id into temp NewRetires from stats_participant where retire_to>=1
> > AND retire_date = (SELECT last_date FROM Project_statsrun WHERE
> > project_id = :ProjectID);
> >
> > results in a table with 5 values...
> >
> > explain analyze delete from email_rank where project_id=25 and id in
> > (select id from NewRetires);
> >
> > Index Scan using email_rank__day_rank on email_rank
> > (cost=0.00..9003741627715.16 rows=45019 width=6) (actual time=408.12..9688.37 rows=3 loops=1)
> > Index Cond: (project_id = 25)
> > Filter: (subplan)
> > SubPlan
> > -> Seq Scan on newretires (cost=100000000.00..100000020.00 rows=1000 width=4) (actual time=0.01..0.05 rows=5 loops=91834)
> > Total runtime: 9689.86 msec
> >
> > But, there's already an index that would fit the bill here perfectly:
> >
> > Table "public.email_rank"
> > Column | Type | Modifiers
> > -----------------------+---------+--------------------
> > project_id | integer | not null
> > id | integer | not null
> > first_date | date | not null
> > last_date | date | not null
> > day_rank | integer | not null default 0
> > day_rank_previous | integer | not null default 0
> > overall_rank | integer | not null default 0
> > overall_rank_previous | integer | not null default 0
> > work_today | bigint | not null default 0
> > work_total | bigint | not null default 0
> > Indexes: email_rank_pkey primary key btree (project_id, id),
> > email_rank__day_rank btree (project_id, day_rank),
> > email_rank__overall_rank btree (project_id, overall_rank)
> >
> > Why isn't it using email_rank_pkey instead of using day_rank then a
> > filter? The original query on sybase (see below) is essentially instant,
> > because it's using the index of (project_id, id), so it doesn't have to
> > read the whole table.
>
> It looks like the seq scan is newretires up there, from your 'id in
> (select id from NewRetires);' part of your query. I.e. the where in() has
> to be done first, and the query planner has no stats on that table, so it
> assumes a seq scan will be faster in case we need the whole thing anyway.
>
> Try adding an analyze newretires in there between the two queries.
>
> No clue as to why it's choosing one index over the other. I don't think
> that really matters a lot, it's the seq scan on the temp table that is
> taking your time on this.

There's no index at all on the temporary table; I fully expect it to
seqscan than. :) The issue is the choice of index on email_rank. It's
only going to hit at most 5 rows in email_rank (which it should be able
to figure out based on newretires and the fact that email_rank_pkey is
unique. I didn't show it, but I did run analyze on the temporary table
(why it doesn't have statistics I don't know...)
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew T. O'Connor 2003-04-29 04:41:38 Re: [HACKERS] Changing the default configuration
Previous Message scott.marlowe 2003-04-28 19:52:57 Re: How to get the optimizer to use an index with multiple