Re: More tablescanning fun

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: More tablescanning fun
Date: 2003-04-25 04:59:24
Message-ID: 20030424235924.B66185@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Apr 24, 2003 at 07:58:30PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > It seems like the metrics used for the cost of index scanning v. table
> > scanning on large tables need to be revisited. It might be such a huge
> > difference in this case because the table is essentially clustered on
> > the primary key.
>
> Probably. What does the correlation figure in pg_stats show as?

stats=# select attname, correlation from pg_stats where
tablename='email_contrib';
attname | correlation
------------+-------------
project_id | 1
id | 0.449204
date | 0.271775
team_id | 0.165588
work_units | 0.0697928

> There's been some previous debate about the equation used to correct
> for correlation, which is certainly bogus (I picked it more or less
> out of the air ;-)). But so far no one has proposed a replacement
> equation with any better foundation ... take a look in
> src/backend/optimizer/path/costsize.c if you want to get involved.

Are you reffering to the PF formula?

> > Also, is there a TODO to impliment
> > real clustered indexes?
>
> No. It's not apparent to me how you could do that without abandoning
> MVCC, which we're not likely to do.

Hmm... does MVCC mandate inserts go at the end? My understanding is that
each tuple indicates it's insert/last modified time; if this is the
case, why would a true clustered index break mvcc? I guess an update
that moves the tuple would be tricky, but I'm guesing there's some kind
of magic that could happen there... worst case would be adding an
'expired' timestamp.

On the other hand, it might be possible to get the advantages of a
clustered index without doing a *true* clustered index. The real point
is to be able to use indexes; I've heard things like 'if you need to
access more than 10% of a table then using an index would be
disasterous', and that's not good... that number should really be over
50% for most reasonable ratios of fields indexed to fields in table (of
course field size plays a factor).
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-04-25 05:23:10 Re: More tablescanning fun
Previous Message Tom Lane 2003-04-24 23:58:30 Re: More tablescanning fun