Re: PG8.2.1 choosing slow seqscan over idx scan

From: "Chad Wagner" <chad(dot)wagner(at)gmail(dot)com>
To: "Jeremy Haile" <jhaile(at)fastmail(dot)fm>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: PG8.2.1 choosing slow seqscan over idx scan
Date: 2007-01-17 17:26:50
Message-ID: 81961ff50701170926r3dd41f39u9e2d4a131b3c3097@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 1/17/07, Jeremy Haile <jhaile(at)fastmail(dot)fm> wrote:
>
> Maybe - I tried running the same query for an older time range that is
> less likely to be cached. The index scan took longer than my previous
> example, but still only took 16 seconds, compared to the 87 seconds
> required to seqscan the table. When I can, I'll restart the machine and
> run a comparison again to get a "pure" test.

Heh, querying a different range of data was a better idea compared to
rebooting.. doh.. So I think you reasonably established that an index scan
for unbuffered data would still be faster than a sequential scan.

To me this is one of those cases where the optimizer doesn't understand the
clustering of the data, and it is being misled by the statistics and fixed
parameters it has. If you have fast disks (I think a fiber SAN probably
counts here) then adjusting random_page_cost lower is reasonable, the lowest
I have heard recommended is 2.0. It would be nice if the database could
learn to estimate these values, as newer versions of Oracle does.

Date is almost always a criteria in scans of this table. As mentioned
> earlier, the table is naturally built in date order - so would
> rebuilding the table help? Is it possible that even though I'm
> inserting in date order, since I delete rows so often the physical
> correlation would get disrupted as disk pages are reused? Perhaps
> clustering on the transaction_date index and periodically running
> "cluster" would help? Does vacuum full help with this at all?

Yes, cluster would rebuild the table for you. I wouldn't do anything too
intrusive, run with the random_page_cost lowered, perhaps vacuum full,
reindex, and see what happens. If it degrades over time, then I would start
looking at partitioning or some other solution.

Yeah - partitioning makes a lot of sense and I've thought about doing
> this in the past. Although I do run queries that cross multiple days,
> most of my queries only look at today's data, so the physical disk
> organization would likely be much better with a partitioned table setup.
> Also, since I usually delete old data one day at a time, I could simply
> drop the old day's partition. This would make vacuuming much less of an
> issue.

Yep, my thoughts exactly. Partitioning support is PostgreSQL is there, but
it needs a bit more of a tighter integration into the core (I shouldn't have
to create a view, n tables, n rules, etc). Additionally, I have read that
at some point when you have "y" partitions the performance degrades, haven't
really looked into it myself.

--
Chad
http://www.postgresqlforums.com/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2007-01-17 17:57:10 Re: Monitoring Transaction Log size
Previous Message Ziegelwanger, Silvio 2007-01-17 16:58:03 Monitoring Transaction Log size