Re: Is this planner choice easily explained?

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Is this planner choice easily explained?
Date: 2002-11-22 14:13:03
Message-ID: 5.1.0.14.0.20021122110611.06a87438@mail.rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


>
>Could we see the pg_stats rows for the columns used in these queries?
>(You have done a vacuum analyze recently, I trust...)

Strange you should ask; it is actually taking some effort to persuade them
it's a good idea. They also don't believe that a LIMIT clause affects
strategy choice, so it's an uphill battle. As a result, my strong suspicion
is that there will be no results in pg_stat - but I have asked for the data.

I am mainly interested in understanding the output from this point of view
-- and to understand the choice that has been made in such a 'boundary
case' (when no analyze has been done).

Am I correct in my interpretation that:

explain SELECT messageblk FROM messageblks WHERE message_idnr =
100::bigint
ORDER BY messageblk_idnr ;

Sort (cost=5793.33..5793.33 rows=1453 width=40)
-> Index Scan using messageblks_msg_idx on messageblks
(cost=0.00..5716.99 rows=1453 width=40)

means it expects to get 1453 rows based on a search for a specific key
(hence why it has a high cost)? Based on other tests I have done, I have
concluded that it assumes a selectivity of 0.5% for non-unique indexes - is
that right?

Whereas with:

explain SELECT messageblk FROM messageblks WHERE message_idnr =
100::bigint
ORDER BY messageblk_idnr
limit 1;

Limit (cost=0.00..777.50 rows=1 width=40)
-> Index Scan using messageblks_id_idx on messageblks
(cost=0.00..1129984.15 rows=1453 width=40)

it looks like the high cost on the last line here is based on the number of
pages/tuples in the file, and that the limit is causing 1/1453th of the
cost to be applied. It looks like it gets the 1453 as a basic default
selectivity again.

The problem I have with this is that it seems that the estimate for the
LIMIT is assuming that the first row returned will be the right one; in
fact it has no guarantee that the specified criteria will be satisfied by
the row. Is this assumption likely to be wrong when we do have stats? ISTM
that a LIMIT with a WHERE clause that is not satisfied by any index used in
the search should not be deemed to reduce the result set size as
drastically. In fact, it may actually have to scan the entire result set
to get the row. Err...I think 8-).

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Hai-Chen Tu 2002-11-22 14:31:30 dbmirror bug
Previous Message Frdric SURLEAU 2002-11-22 09:36:10 About timestamp and timestamptz