Re: index usage (and foreign keys/triggers)

From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Patrik Kudo <kudo(at)pingpong(dot)net>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: index usage (and foreign keys/triggers)
Date: 2003-02-26 15:49:57
Message-ID: 20030226074602.F71868-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 26 Feb 2003, Patrik Kudo wrote:

> Hi gurus et al ;)
>
> I have a database with a couple of hundred tables. In this database one
> table, "person", represents a user. person.userid is a primary key. To
> this key there are about a hundred foreign keys spread out over
> aproximately equaly many tables. When deleting a user I noticed a quite
> big difference in time depending on how much data there are in the
> foreign key-tables. After some investigation I concluded that for some
> users and some tables the indices wheren't used when deleting, resulting
> in longer run-times.

>
> Here's an example:
>
> select count(*) from login;
> count
> -------
> 96824
>
> select count(*) from login where userid = 'patrik';
> count
> -------
> 608
>
> select count(*) from login where userid = 'jennie';
> count
> -------
> 4211
>
> explain delete from login where userid = 'patrik';
> QUERY PLAN
>
> ---------------------------------------------------------------------------------
> Index Scan using login_userid_idx on login (cost=0.00..237.06 rows=61
> width=6)
> Index Cond: (userid = 'patrik'::text)
>
> explain delete from login where userid = 'jennie';
> QUERY PLAN
> -----------------------------------------------------------
> Seq Scan on login (cost=0.00..2045.30 rows=3421 width=6)
> Filter: (userid = 'jennie'::text)
>
>
> What makes the planer choose seq scan for 'jennie', but not for
> 'patrik'? I also tested the following:

Well at 3421 of 96824 it's estimating that the cost is lower, what's
the explain look like with seqscan turned off (my guess'd be it's
slightly higher cost). It's possible that random_page_cost should
be lower, or that perhaps there's some level of clustering in the data
that's not being picked up. You might want to try raising the
number of statistics buckets and re-analyzing just to see if that helps.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kari Lempiainen 2003-02-26 15:52:49 timezone_hour missing
Previous Message Tom Lane 2003-02-26 15:46:02 Re: Using PGSQL to help coordinate many servers