Re: Compound keys and foreign constraints

From: <wespvp(at)syntegra(dot)com>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject: Re: Compound keys and foreign constraints
Date: 2004-04-02 22:09:57
Message-ID: BC9340D5.C19F%wespvp@syntegra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 4/2/04 12:33 PM, "Alvaro Herrera" <alvherre(at)dcc(dot)uchile(dot)cl> wrote:

>> -> Index Scan using message_recipients_i_recipient on
>> message_recipients r (cost=0.00..30569.25 rows=30622 width=21) (actual
>> time=83.146..31609.149 rows=312741 loops=1)
>
> This estimate is off by an order of magnitude. Maybe you want to
> increase the statistic target for this column ...

Please explain. The only thing I can find about this is

"For scalar data types, attstattarget is both the target number of ³most
common values² to collect, and the target number of histogram bins to
create."

Why would this affect the row count estimations?

With 10 being the default, what would an appropriate value be?

I changed it to 100 for both message_key and reciepent (in
message_recipients table). That used the same plan and made the discrepancy
worse 13000 estimated, 312000 actual:

> -> Index Scan using message_recipients_i_recipient on
> message_recipients r (cost=0.00..13324.75 rows=13249 width=21) (actual
> time=4751.701..34373.808 rows=312741 loops=1)

And, it still only uses the compound index if I set random_page_cost to < 1.

Analyze is run regularly by auto-vacuum (multiple times a day - I need to
reduce this).

If I read this right, before changing the statistic target:

db=# select relname, relfilenode from pg_class where relname like
'message_recip%';

relname | relfilenode
--------------------------------+-------------
message_recipients_i_recipient | 17234
message_recipients_i_message | 17235
message_recipients | 17162
(3 rows)

db=> select attname, attrelid, attstattarget from pg_attribute where
attname='recipient';
attname | attrelid | attstattarget
-----------+----------+---------------
recipient | 17162 | -1
recipient | 17234 | 0

db=# select staattnum, stadistinct from pg_statistic where starelid=17162;
staattnum | stadistinct
-----------+-------------
1 | 1.11264e+06
2 | 3598
3 | 2
(3 rows)

db=# select attname, attstattarget, attnum from pg_attribute where
attrelid=17162;
attname | attstattarget | attnum
----------------+---------------+--------
tableoid | 0 | -7
cmax | 0 | -6
xmax | 0 | -5
cmin | 0 | -4
xmin | 0 | -3
oid | 0 | -2
ctid | 0 | -1
message_key | -1 | 1
recipient | -1 | 2
recipient_type | -1 | 3

There are 20+ million (unique) message_key records, so stadistinct for
column 1 is off by a factor of 20. There are just under 1.2 million unique
recipients (60 million total records), so column 2 (recipient) is off by a
factor of over 300.

In the messages table, message_key (which is unique) has statdistinct = -1

db=# alter table message_recipients alter column recipient set statistics
100;
db=# alter table message_recipients alter column message_key set statistics
100;
db=# analyze;

db=# select distinct relname, attname, attnum, attstattarget, stadistinct
from pg_class c, pg_attribute a, pg_statistic s where
c.relfilenode=a.attrelid and c.relfilenode=starelid and a.attnum=s.staattnum
and relname='message_recipients' order by attnum;

relname | attname | attnum | attstattarget | stadistinct
--------------------+----------------+--------+---------------+-------------
message_recipients | message_key | 1 | 100 | 2.19256e+06
message_recipients | recipient | 2 | 100 | 8672
message_recipients | recipient_type | 3 | -1 | 2

Stadistinct is about twice what it was before, but is still way off the
number of distinct values for that column. Manually setting stadistinct
closer to the number of unique values makes the discrepancy still larger.

Wes

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Randall Skelton 2004-04-02 23:05:29 Casting int to bool with join...
Previous Message Manfred Koizar 2004-04-02 21:53:29 Re: [GENERAL] Large DB