Re: BUG #7495: chosen wrong index

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <psql(at)elbrief(dot)de>,<pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #7495: chosen wrong index
Date: 2012-08-15 15:08:21
Message-ID: 502B75150200002500049751@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

<psql(at)elbrief(dot)de> wrote:

> insert into bla ( a , b )
> select a , a
> from generate_series( 1 , 1000000 ) as a ( a ) ;

> explain analyze select * from bla
> where b > 990000 order by a limit 10 ;
> [uses index on b and has a long run time]

The problem is that PostgreSQL doesn't have any sense of the
correlation between columns a and b (i.e., they are always equal)
and assumes that it will find enough matching rows soon enough on
the scan of the index on b to make it cheaper than sorting the
results of finding all rows that match the predicate. Try your test
suite again with the only change being the insert statement:

insert into bla ( a , b )
select a , floor(random() * 1000000) + 1
from generate_series( 1 , 1000000 ) as a ( a ) ;

On my machine, with that data, all of the queries run fast.

We've been looking at ways to develop statistics on multiple
columns, so that correlations like that don't confuse the optimizer,
or trying to evaluate the "risk" of a query taking a long time based
on unexpected correlations.

Not really a bug; more like a recognized opportunity to improve the
optimizer.

-Kevin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Angel Zúñiga 2012-08-15 15:19:14 ProblemWithCharsOSX
Previous Message Andres Freund 2012-08-15 14:09:54 Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size