Re: RFC: planner statistics in 7.2

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFC: planner statistics in 7.2
Date: 2001-04-23 15:56:47
Message-ID: 3.0.5.32.20010424015647.03ca35e0@mail.rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 10:10 23/04/01 -0400, Tom Lane wrote:
>>> All that we're discussing here is one specific parameter in the cost
>>> estimation for an indexscan, viz, the extent to which the table ordering
>>> agrees with the index ordering.
>
>> This does not necessarily follow. A table ordering need not follow the sort
>> order of an index for the index to have a low indexscan cost. All that is
>> required is that most of the rows referred to by an index node must reside
>> in a page or pages that will be read by one IO. eg. a table that has a
>> sequence based ID, with, say 20% of rows updated, will work nicely with an
>> indexscan on the ID, even though it has never been clustered.
>
>Right, what matters is the extent of correlation between table ordering
>and index ordering, not how it got to be that way.

No; *local* ordering needs to *roughly* match. Global ordering and
record-by-record ordering don't matter.

For example, for a table with an ID field, the rows may be stored as (where
--- indicates a mythical page)

-----
5
9
6
7
-----
1
10
2
3
-----
4
8
11
12
-----

A sorted index may have nodes pointers to (1,2,3), (4,5,6), (7,8,9) and
(10,11,12). The first node would take 1 IO, then each of the others would
take 2. This would give a much more reasonable estimate for the indexscan
costs (assuming a random sample is adequate).

>> What I'm suggesting is that if you look at a random sample of index nodes,
>> you should be able to get a statistically valid estimate of the 'clumping'
>> of the data pointed to by the index.
>
>And I'm saying that you don't actually have to look at the index in
>order to compute the very same estimate.

No. Not given the above.

>The only property of the index
>that matters is its sort order; if you assume you know the right sort
>order (and in practice there's usually only one interesting possibility
>for a column) then you can compute the correlation just by looking at
>the table.

This is true, but only if you are strictly interested in sort order, which
I don't think we are.

>Andreas correctly points out that this approach doesn't extend very well
>to multi-column or functional indexes, but I'm willing to punt on those
>for the time being ...

My approach should work with arbitrary indexes.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Adam Rose 2001-04-23 16:08:12 Re: row name length
Previous Message Gregory Wood 2001-04-23 15:42:39 Re: Replication and on-line recovery