Re: Inconsistant use of index.

From: Ron Mayer <ron(at)intervideo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-bugs(at)postgresql(dot)org>, Ron Mayer <ron(at)intervideo(dot)com>
Subject: Re: Inconsistant use of index.
Date: 2002-04-03 19:14:27
Message-ID: Pine.LNX.4.33.0204031108250.24107-100000@ron
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On Tue, 26 Mar 2002, Tom Lane wrote:
> Ron Mayer <ron(at)intervideo(dot)com> writes:
> >> I'm particularly interested in the correlation estimate for the dat
> >> column. (Would you happen to have an idea whether the data has been
> >> inserted more-or-less in dat order?)
>
> > I beleve much of February was loaded first, then we back-filled January,
> > and daily I've been adding March's results. I don't believe the index-usage
> > stopped when we did the january fill... something happend a few days ago after
> > a pretty routine daily load.
>
> The correlation estimate for dat is pretty low (0.086088), which I think
> reflects the fact that on a macro level your data is not very orderly
> (Feb/Jan/Mar). However, if it's been loaded on a daily basis then the
> records for any single day will be together --- which is why the indexed
> probe for a single day is so fast. I don't see any way that we can
> expect the system to model this effect with only one ordering-correlation
> number :-( ... so a proper fix will have to wait for some future release
> when we can think about having more extensive stats about ordering.
>
> In the meantime, it would be interesting to see if re-establishing the
> big-picture order correlation would persuade the planner to do the right
> thing. Could you do something like this:
>
> CREATE TABLE foo AS SELECT * FROM fact ORDER BY dat;
> TRUNCATE TABLE fact;
> INSERT INTO fact SELECT * FROM foo;
> DROP TABLE foo;
> VACUUM ANALYZE fact;
>
> (this should leave you in a state where pg_stats shows correlation 1.0
> for fact.dat) and then see what you get from EXPLAIN?
>
> regards, tom lane

I did quite a bit more playing with this, and no matter what the
correlation was (1, -0.001), it never seemed to have any effect
at all on the execution plan.

Should it? With a high correlation the index scan is a much better choice.

Ron

---
--- create the table with a correlation of "1".
---
logs2=# CREATE TABLE fact_by_dat AS SELECT * FROM fact ORDER BY dat;
SELECT
logs2=# CREATE INDEX fact_by_dat__dat ON fact_by_dat(dat);
CREATE
logs2=# vacuum analyze fact_by_dat;
VACUUM
logs2=# select correlation from pg_stats where tablename='fact_by_dat' and attname='dat';
correlation
-------------
1
(1 row)

---
--- Still does the "Seq Scan"
---
logs2=# explain analyze select count(*) from fact_by_dat where dat='2002-03-01';
NOTICE: QUERY PLAN:
Aggregate (cost=380347.31..380347.31 rows=1 width=0) (actual time=77785.14..77785.14 rows=1 loops=1)
-> Seq Scan on fact (cost=0.00..379816.25 rows=212423 width=0) (actual time=20486.16..77420.05 rows=180295 loops=1)
Total runtime: 77785.28 msec
EXPLAIN

---
--- Disable Seq Scan... 30 times faster.
---
logs2=# set enable_seqscan to off;
SET VARIABLE
logs2=# explain analyze select count(*) from fact_by_dat where dat='2002-03-01';
NOTICE: QUERY PLAN:
Aggregate (cost=5502.57..5502.57 rows=1 width=0) (actual time=2671.20..2671.20 rows=1 loops=1)
-> Index Scan using fact_by_dat__dat on fact_by_dat (cost=0.00..4974.99 rows=211036 width=0)
(actual time=90.24..2339.64 rows=180295 loops=1)
Total runtime: 2693.87 msec

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2002-04-03 20:22:50 Re: Inconsistant use of index.
Previous Message pgsql-bugs 2002-04-03 17:47:27 Bug #628: move works incorrectly on cursors using GiST indexes