Re: poor performance of db?

From: John Arbash Meinel <john(at)arbash-meinel(dot)com>
To: SpaceBallOne <space_ball_one(at)hotmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: poor performance of db?
Date: 2005-01-25 03:15:28
Message-ID: 41F5B9D0.2010803@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

SpaceBallOne wrote:

> Thanks for the reply John,
>
> There are approximately 800 rows total in our job table (which stays
> approximately the same because 'completed' jobs get moved to a
> 'job_archive' table).The other jobs not shown by the specific query
> could be on backorder status, temporary deleted status, etc etc.
>
> You are correct in assuming the _id and _no (stands for 'number')
> fields are unique - this was one of the first pages I built when I
> started learning postgres, so not knowing how to set up primary and
> foriegn keys at the time, I did it that way ... it is normalised to a
> point (probably rather sloppy, but its a juggling act between learning
> on the fly, what I'd like to have, and time constraints of being the
> only I.T. guy in the company!)...
>
> I think I will definitely focus on converting my database and php
> pages to using proper primary keys in postgres - especially if they
> automatically index themselves. I didn't do a vacuum analyse on them
> so that may explain why they didn't seem to do much.

You probably can add them now if you don't want to do a lot of redesign.
ALTER TABLE job ADD PRIMARY KEY (id);

If they are not unique this will cause problems, but as they should be
unique, I think it will work.

I'm not sure how much help indexes will be if you only have 800 rows,
and your queries use 300+ of them.

You might need re-think the query/table design.

You might try doing nested queries, or explicit joins, rather than one
big query with a WHERE clause.

Meaning do stuff like:

SELECT
(job JOIN customer ON job.customer_no = customer.customer_no) as jc
JOIN location on jc.location_no = location.location_no
...

I also see that the planner seems to mis-estimate the number of rows in
some cases. Like here:

> -> Hash (cost=14.53..14.53 rows=853 width=19) (actual
> time=9.79..9.79 rows=0 loops=1)
> -> Seq Scan on street (cost=0.00..14.53 rows=853
> width=19) (actual time=0.01..5.12 rows=853 loops=1)
> -> Hash (cost=9.91..9.91 rows=491 width=20) (actual
> time=5.73..5.73 rows=0 loops=1)
> -> Seq Scan on ubd (cost=0.00..9.91 rows=491 width=20)
> (actual time=0.02..2.98 rows=491

Where it thinks the hash will return all of the rows from the sequential
scan, when in reality it returns none.

I think problems with the planner fall into 3 categories.

1. You didn't VACUUM ANALYZE.
2. You did, but the planner doesn't keep sufficient statistics (ALTER
TABLE job ALTER COLUMN no SET STATISTICS <a number>)
3. You're join needs cross column statistics, which postgres doesn't
support (yet).

If you only have 800 rows, I don't think you have to worry about
statistics, so that leaves things at 1 or 3. If you did do 1, then I
don't know what to tell you.

John
=:->

PS> I'm not a guru at this stuff, so some of what I say may be wrong.
But hopefully I point you in the right direction.

>
> Thanks,
> Dave
> space_ball_one(at)hotmail(dot)com
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message andrew 2005-01-25 06:18:11 Re: poor performance of db?
Previous Message Jim C. Nasby 2005-01-25 03:11:04 Re: PostgreSQL clustering VS MySQL clustering