Does "correlation" mislead the optimizer on large tables?

From: Ron Mayer <ron(at)intervideo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Ron Mayer <ron(at)intervideo(dot)com>
Subject: Does "correlation" mislead the optimizer on large tables?
Date: 2003-01-24 04:16:09
Message-ID: Pine.LNX.4.44.0301231909070.986-100000@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Short summary:

On a large tables, I think the "correlation" pg_stats field as calculated
by "vacuum analyze" or "analyze" can mislead the optimizer.

By forcing index scans on some queries shown below, some queries
in my database speed up from 197 seconds to under 30 seconds.

I'd like feedback on whether or not having a smarter "analyze"
function (which I think I could write as a separate utility) would
help me situations like this.

Longer:

In particular, if I have a large table t with columns 'a','b','c', etc,
and I cluster the table as follows:

create table t_ordered as select * from t order by a,b;
vacuum analyze t_ordered;

Column "b" will (correctly) get a very low "correlation" in
the pg_stats table -- but I think the optimizer would do better
assuming a high correlation because similar 'b' values are still
grouped closely on the same disk pages.

Below is a real-world example of this issue.

The table "fact" is a large one (reltuples = 1e8, relpages = 1082385)
and contains about 1 years worth of data. The data was loaded
sequentialy (ordered by dat,tim).

logs=# \d fact;
Table "fact"
Column | Type | Modifiers
--------+------------------------+-----------
dat | date |
tim | time without time zone |
ip_id | integer |
bid_id | integer |
req_id | integer |
ref_id | integer |
uag_id | integer |
Indexes: i_fact_2__bid_id,
i_fact_2__dat,
i_fact_2__tim,
i_fact_2__ip_id,
i_fact_2__ref_id,
i_fact_2__req_id

With a table this large, each day's worth of data contains
about 3000 pages; or conversely, each page contains only about
a 30 second range of values for "tim".

As shown in the queries below, the optimizer wanted to do
a sequential scan when looking at a 10 minute part of the day.
However also as shown, forcing an index scan did much better.

I'm guessing this happened because the optimizer saw the
horrible correlation, and decided it would have to read
an enormous number of pages if it did an index scan.

===========================================

logs=# select tablename,attname,n_distinct,correlation from pg_stats where tablename='fact';
tablename | attname | n_distinct | correlation
-----------+---------+------------+-------------
fact | dat | 365 | 1
fact | tim | 80989 | -0.00281447
fact | ip_id | 44996 | 0.660689
fact | bid_id | 742850 | 0.969026
fact | req_id | 2778 | 0.67896
fact | ref_id | 595 | 0.258023
fact | uag_id | 633 | 0.234216
(7 rows)

logs=# explain analyze select * from fact where tim<'00:10:00';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on fact (cost=0.00..1949838.40 rows=526340 width=32) (actual time=0.39..197447.50 rows=402929 loops=1)
Filter: (tim < '00:10:00'::time without time zone)
Total runtime: 197810.01 msec
(3 rows)

logs=# explain analyze select * from fact where tim<'00:10:00';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on fact (cost=0.00..1949838.40 rows=526340 width=32) (actual time=15.25..156705.76 rows=402929 loops=1)
Filter: (tim < '00:10:00'::time without time zone)
Total runtime: 157089.15 msec
(3 rows)

logs=# set enable_seqscan = off;
SET
logs=# explain analyze select * from fact where tim<'00:10:00';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using i__fact__tim on fact (cost=0.00..2110978.39 rows=526340 width=32) (actual time=104.41..23307.84 rows=402929 loops=1)
Index Cond: (tim < '00:10:00'::time without time zone)
Total runtime: 23660.95 msec
(3 rows)

logs=# explain analyze select * from fact where tim<'00:10:00';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Scan using i__fact__tim on fact (cost=0.00..2110978.39 rows=526340 width=32) (actual time=0.03..1477.35 rows=402929 loops=1)
Index Cond: (tim < '00:10:00'::time without time zone)
Total runtime: 1827.94 msec
(3 rows)

logs=#

*******************************************************************************
*******************************************************************************

So two questions:

a) Am I on to something.... or is something else the reason why
the optimizer chose the much slower sequential scan?

b) If I did write an "analyze" that tried to set "correlation" values
that took into account such local grouping of data, would anyone
be interested?

Ron

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-01-24 06:29:35 Re: Crash Recovery
Previous Message Noah Silverman 2003-01-24 03:32:58 Crash Recovery