Inconsistant use of index.

From: Ron Mayer <ron(at)intervideo(dot)com>
To: <pgsql-bugs(at)postgresql(dot)org>
Cc: Ron Mayer <ron(at)intervideo(dot)com>
Subject: Inconsistant use of index.
Date: 2002-03-26 17:06:39
Message-ID: Pine.LNX.4.33.0203260845110.16667-100000@ron
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

In porting a pretty large (10s of millions of records) data warehouse
from Oracle to PostgreSQL,

Once some of my tables started getting pretty large, PostgreSQL
suddenly stopped using indexes when I use expressions like "col = value"
decreasing performance by 20X. This meant that my daily reports started
taking two days instead of 2 hours to run!!!!).

Interestingly when I re-write the queries using >= and <= to produce
identical results, the index works fine. Example queries in question include:

select count(*) from fact where dat='2002-03-01';
select count(*) from fact where dat<='2002-03-01' and dat>='2002-03-01';

The distribution of values in "dat" are roughly evenly spaced from
'2002-01-01' through '2002-03-25'.

Attached below are

A: Information about the table, including "\d" and "vacuum verbose analyze"
B: Output of "explain analyze" from the above queries (showing the 20X
slowdown)
C: Version and configuration information.

Any suggestions on what I should look at next would be appreciated.

Thanks much,
Ron

PS: As a quite perverse workaround, I rewrote all my queries to have
"col<=val and col>=val" everywhere I used to have "col=val"
and everything is running fine again... but that's just wierd.

============================================================================
== A: Information about the table
============================================================================
logs2=# \d fact
Table "fact"
Column | Type | Modifiers
--------+------------------------+-----------
dat | date |
tim | time without time zone |
ip_id | integer |
bid_id | integer |
req_id | integer |
ref_id | integer |
Indexes: i_fact__bid_id,
i_fact__dat,
i_fact__ref_id,
i_fact__req_id,
i_fact__tim

logs2=# select count(*) from fact;
count
----------
18410778
(1 row)

logs2=# vacuum verbose analyze fact;
NOTICE: --Relation fact--
NOTICE: Pages 144967: Changed 0, Empty 0; Tup 18410778: Vac 0, Keep 0, UnUsed
0.
Total CPU 11.56s/2.97u sec elapsed 71.91 sec.
NOTICE: Analyzing fact
VACUUM

============================================================================
== B: Explain Analyze for the two queries.
== Note that the <=, >= one was over 20X faster.
============================================================================

logs2=# explain analyze select count(*) from fact where dat='2002-03-01';
NOTICE: QUERY PLAN:

Aggregate (cost=375631.14..375631.14 rows=1 width=0) (actual
time=76689.42..76689.42 rows=1 loops=1)
-> Seq Scan on fact (cost=0.00..375101.72 rows=211765 width=0) (actual
time=20330.96..76391.94 rows=180\
295 loops=1)
Total runtime: 76707.92 msec

EXPLAIN
logs2=# explain analyze select count(*) from fact where dat<='2002-03-01' and
dat >='2002-03-01';
NOTICE: QUERY PLAN:

Aggregate (cost=5.98..5.98 rows=1 width=0) (actual time=2921.39..2921.39
rows=1 loops=1)
-> Index Scan using i_fact__dat on fact (cost=0.00..5.98 rows=1 width=0)
(actual time=73.55..2583.53 ro\
ws=180295 loops=1)
Total runtime: 2921.55 msec

EXPLAIN
logs2=#

============================================================================
== C: Version and configuration information.
============================================================================

[17]localhost:~/apps/pgsql% psql -V
psql (PostgreSQL) 7.2
contains support for: readline, history
Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
Portions Copyright (c) 1996, Regents of the University of California
Read the file COPYRIGHT or use the command \copyright to see the
usage and distribution terms.

[17]localhost:/scratch/pgsql/data% diff -wiu postgresql.conf postgresql.conf.bak
--- postgresql.conf Sat Mar 23 15:39:34 2002
+++ postgresql.conf.bak Tue Mar 5 19:33:54 2002
@@ -50,7 +50,7 @@
#shared_buffers = 10000 # 2*max_connections, min 16
## goes to about 84 meg with 4000.
#shared_buffers = 4000 # 2*max_connections, min 16
-shared_buffers = 10000 # 2*max_connections, min 16
+shared_buffers = 8000 # 2*max_connections, min 16

#max_fsm_relations = 100 # min 10, fsm is free space map
#max_fsm_pages = 10000 # min 1000, fsm is free space map

--
Ronald Mayer
Director of Web Business
InterVideo, Inc.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Beckstette 2002-03-26 17:18:02 postmaster crash
Previous Message Vaishakhi Ajmera 2002-03-26 16:09:07 Installing postgres