Skip site navigation (1) Skip section navigation (2)

Re: More tablescanning fun

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: More tablescanning fun
Date: 2003-04-25 04:59:24
Message-ID: 20030424235924.B66185@flake.decibel.org (view raw or flat)
Thread:
Lists: pgsql-performance
On Thu, Apr 24, 2003 at 07:58:30PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > It seems like the metrics used for the cost of index scanning v. table
> > scanning on large tables need to be revisited. It might be such a huge
> > difference in this case because the table is essentially clustered on
> > the primary key.
> 
> Probably.  What does the correlation figure in pg_stats show as?
 
stats=# select attname, correlation from pg_stats where
tablename='email_contrib';
  attname   | correlation 
  ------------+-------------
  project_id |           1
  id         |    0.449204
  date       |    0.271775
  team_id    |    0.165588
  work_units |   0.0697928

> There's been some previous debate about the equation used to correct
> for correlation, which is certainly bogus (I picked it more or less
> out of the air ;-)).  But so far no one has proposed a replacement
> equation with any better foundation ... take a look in 
> src/backend/optimizer/path/costsize.c if you want to get involved.

Are you reffering to the PF formula?

> > Also, is there a TODO to impliment
> > real clustered indexes?
> 
> No.  It's not apparent to me how you could do that without abandoning
> MVCC, which we're not likely to do.
 
Hmm... does MVCC mandate inserts go at the end? My understanding is that
each tuple indicates it's insert/last modified time; if this is the
case, why would a true clustered index break mvcc? I guess an update
that moves the tuple would be tricky, but I'm guesing there's some kind
of magic that could happen there... worst case would be adding an
'expired' timestamp.

On the other hand, it might be possible to get the advantages of a
clustered index without doing a *true* clustered index. The real point
is to be able to use indexes; I've heard things like 'if you need to
access more than 10% of a table then using an index would be
disasterous', and that's not good... that number should really be over
50% for most reasonable ratios of fields indexed to fields in table (of
course field size plays a factor).
-- 
Jim C. Nasby (aka Decibel!)                    jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2003-04-25 05:23:10
Subject: Re: More tablescanning fun
Previous:From: Tom LaneDate: 2003-04-24 23:58:30
Subject: Re: More tablescanning fun

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group