Quick Links

Re: RFC: planner statistics in 7.2

From:	Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: RFC: planner statistics in 7.2
Date:	2001-04-23 15:56:47
Message-ID:	3.0.5.32.20010424015647.03ca35e0@mail.rhyme.com.au
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

At 10:10 23/04/01 -0400, Tom Lane wrote:
>>> All that we're discussing here is one specific parameter in the cost
>>> estimation for an indexscan, viz, the extent to which the table ordering
>>> agrees with the index ordering.
>
>> This does not necessarily follow. A table ordering need not follow the sort
>> order of an index for the index to have a low indexscan cost. All that is
>> required is that most of the rows referred to by an index node must reside
>> in a page or pages that will be read by one IO. eg. a table that has a
>> sequence based ID, with, say 20% of rows updated, will work nicely with an
>> indexscan on the ID, even though it has never been clustered.
>
>Right, what matters is the extent of correlation between table ordering
>and index ordering, not how it got to be that way.

No; *local* ordering needs to *roughly* match. Global ordering and
record-by-record ordering don't matter.

For example, for a table with an ID field, the rows may be stored as (where
--- indicates a mythical page)

-----
5
9
6
7
-----
1
10
2
3
-----
4
8
11
12
-----

A sorted index may have nodes pointers to (1,2,3), (4,5,6), (7,8,9) and
(10,11,12). The first node would take 1 IO, then each of the others would
take 2. This would give a much more reasonable estimate for the indexscan
costs (assuming a random sample is adequate).

>> What I'm suggesting is that if you look at a random sample of index nodes,
>> you should be able to get a statistically valid estimate of the 'clumping'
>> of the data pointed to by the index.
>
>And I'm saying that you don't actually have to look at the index in
>order to compute the very same estimate.

No. Not given the above.

>The only property of the index
>that matters is its sort order; if you assume you know the right sort
>order (and in practice there's usually only one interesting possibility
>for a column) then you can compute the correlation just by looking at
>the table.

This is true, but only if you are strictly interested in sort order, which
I don't think we are.

>Andreas correctly points out that this approach doesn't extend very well
>to multi-column or functional indexes, but I'm willing to punt on those
>for the time being ...

My approach should work with arbitrary indexes.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

In response to

Re: RFC: planner statistics in 7.2 at 2001-04-23 14:10:33 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Adam Rose	2001-04-23 16:08:12	Re: row name length
Previous Message	Gregory Wood	2001-04-23 15:42:39	Re: Replication and on-line recovery