More tablescanning fun

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: More tablescanning fun
Date: 2003-04-24 23:38:17
Message-ID: 20030424183817.A66185@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On this table

project_id | integer | not null
id | integer | not null
date | date | not null
team_id | integer | not null
work_units | bigint | not null
Indexes: email_contrib_pkey primary key btree (project_id, id, date)

with this breakdown of data

project_id | count
------------+----------
5 | 56427141
8 | 1058843
24 | 361595
25 | 4092575
205 | 58512516

Any kind of operation on an entire project wants to tablescan, even
though it's going to take way longer.

explain analyze select sum(work_units) from email_contrib where
project_id=8;

Index scan 126, 56, 55 seconds
Seq. scan 1517, 850, 897 seconds

It seems like the metrics used for the cost of index scanning v. table
scanning on large tables need to be revisited. It might be such a huge
difference in this case because the table is essentially clustered on
the primary key. I can test this by doing an aggregate for, say, a
specific team_id, which would be pretty well spread across the entire
table, but that'll have to wait a bit.

Anyone have any thoughts on this? Also, is there a TODO to impliment
real clustered indexes? Doing stuff by project_id on this table in
sybase was very efficient, because there was a real clustered index on
the PK. By clustered index, I mean an index where the leaf nodes of the
B-tree were the actual table rows. This means the only overhead in going
through the index is scanning the branches, which in this case would be
pretty light-weight.

Is this something that I should be using some PGSQL-specific feature
for, like inheritance?

I've been really happy so far with PGSQL (comming from Sybase and DB2),
but it seems there's still some pretty big performance issues that want
to be addressed (or I should say performance issues that hurt really big
when you hit them :) ).
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-04-24 23:58:30 Re: More tablescanning fun
Previous Message Frederic Jolliton 2003-04-24 17:40:17 Re: Important speed difference between a query and a