Re: Query using SeqScan instead of IndexScan

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: chris smith <dmagick(at)gmail(dot)com>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query using SeqScan instead of IndexScan
Date: 2006-04-02 04:31:34
Message-ID: 442F53A6.9090707@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

chris smith wrote:

> <rant>
> It'd be nice if the database developers agreed on what terms meant.
>
> http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html
>
> The SHOW INDEX statement displays a cardinality value based on N/S,
> where N is the number of rows in the table and S is the average value
> group size. That ratio yields an approximate number of value groups in
> the table.
> </rant>
>
> A work colleague found that information a few weeks ago so that's
> where my misunderstanding came from - if I'm reading that right they
> use n_distinct as their "cardinality" basis.. then again I could be
> reading that completely wrong too.
>

Yeah that's right - e.g using the same table in postgres and mysql:

pgsql> SELECT attname,n_distinct,correlation
FROM pg_stats
WHERE tablename='fact0'
AND attname LIKE 'd%key';
attname | n_distinct | correlation
---------+------------+-------------
d0key | 10000 | -0.0211169
d1key | 100 | 0.124012
d2key | 10 | 0.998393
(3 rows)

mysql> SHOW INDEX FROM fact0
-> ;
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| fact0 | 1 | fact0_d0key | 1 | d0key | A
| 10000 | NULL | NULL | | BTREE | |
| fact0 | 1 | fact0_d1key | 1 | d1key | A
| 100 | NULL | NULL | | BTREE | |
| fact0 | 1 | fact0_d2key | 1 | d2key | A
| 10 | NULL | NULL | | BTREE | |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)

It is a bit confusing - '(distinct) cardinality' might be a better
heading for their 'cardinality' column!

On the correlation business - I don't think Mysql calculates it (or if
it does, its not displayed).

> I believe postgres (because it's a lot more standards compliant).. but
> sheesh - what a difference!
>

Well yes - however, to be fair to the Mysql guys, AFAICS the capture and
display of index stats (and any other optimizer related data) is not
part of any standard.

Cheers

Mark

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-04-02 04:46:09 Re: Query using SeqScan instead of IndexScan
Previous Message Alvaro Herrera 2006-04-02 04:26:45 Re: Query using SeqScan instead of IndexScan