Re: PG8.2.1 choosing slow seqscan over idx scan

From: "Chad Wagner" <chad(dot)wagner(at)gmail(dot)com>
To: "Jeremy Haile" <jhaile(at)fastmail(dot)fm>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: PG8.2.1 choosing slow seqscan over idx scan
Date: 2007-01-17 04:52:39
Message-ID: 81961ff50701162052s4779103dq3f0314fe2ac17134@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 1/16/07, Jeremy Haile <jhaile(at)fastmail(dot)fm> wrote:
>
> The table is heavily inserted and deleted from. Recently I had done a
> very large delete.

That's what I suspected.

Here is the results of the query you sent me: (sorry it's hard to read)
>
> "transaction_date";0;8;172593;-0.194848
>
Just curious - what does that tell us?

Based on the explain plan you posted earlier we learned the optimizer
believed the query would return 322558 rows (and it was reasonably accurate,
too) for a relatively small time frame (15 hours and 20 minutes).

-> Seq Scan on transaction_facts (cost=0.00..333928.25 rows=322558
width=16) (actual time=19917.957..140036.910 rows=347434 loops=1)

Based on the information you just posted, the average row length is 156
bytes.

347434 rows * 156 bytes = 52MB (reasonable it could be held in your shared
buffers, which makes Tom's suggestion very plausible, the index scan may not
be cheaper -- because it is all cached)

The estimation of cost when you are requesting a "range" of data will
involve the "correlation" factor, the correlation is defined as:

"Statistical correlation between physical row ordering and logical ordering
of the column values. This ranges from -1 to +1. When the value is near -1
or +1, an index scan on the column will be estimated to be cheaper than when
it is near zero, due to reduction of random access to the disk. (This column
is NULL if the column data type does not have a < operator.)"

Which means that as correlation approaches zero (which it is -0.19, I would
call that zero) it represents that the physical ordering of the data (in the
data files) is such that a range scan of the btree index would result in
many scatter reads (which are slow). So the optimizer considers whether a
"scattered" read will be cheaper than a sequential scan based on a few
factors: a) correlation [for ranges] b) size of table c) estimated
cardinality [what does it think you are going to get back]. Based on those
factors it may choose either access path (sequential or index).

One of the reasons why the sequential scan is slower is because the
optimizer doesn't know the data you are requesting is sitting in the cache
(and it is VERY unlikely you have the entire table in cache, unless it is a
heavily used table or very small table, which it's probably not). So once
it decides a sequential scan is the best plan, it starts chugging away at
the disk and pulling in most of the pages in the table and throws away the
pages that do not meet your criteria.

The index scan is quicker (may be bogus, as Tom suggested) because the it
starts chugging away at the index and finds that many of the pages you are
interested in are cached (but it didn't know, you just got lucky!).

In practice, once you start pulling in 15% or more of the table it is often
cheaper just to read the entire table in, rather than scatter reads + double
I/O. Remember that an index access means I have to read the index PLUS the
table from disk, and a sequential scan just throws away the index and reads
the table from disk.

I would suggest running explain analyze after restarting the database (that
may not be even good enough, because a large portion of the data file may be
in the OS's buffer cache, hrm -- reboot? unmount?) and see how cheap that
index access path is.

One thing to be careful of here is that you really need to consider what is
the primary use of the table, and what are the major queries you will be
launching against it. But you could improve the correlation by rebuilding
the table ordered by the transaction_date column, but it may screw up other
range scans. Another option is partitioning. I wouldn't do any of this
stuff, until you find out the last tweak you made still holds true, give it
a few days, perhaps test it after a clean reboot of the server.

Let me know if any of this is inaccurate folks, as I certainly don't profess
to be an expert on the internals of PostgreSQL, but I believe it is accurate
based on my prior experiences with other database products. :)

--
Chad
http://www.postgresqlforums.com/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2007-01-17 05:46:07 Re: PG8.2.1 choosing slow seqscan over idx scan
Previous Message Jeremy Haile 2007-01-17 02:58:59 Re: PG8.2.1 choosing slow seqscan over idx scan