Re: Performance problem with semi-large tables

From: "David Parker" <dparker(at)tazznetworks(dot)com>
To: "Ken Egervari" <ken(at)upfactor(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance problem with semi-large tables
Date: 2005-01-29 22:04:26
Message-ID: 07FDEE0ED7455A48AC42AC2070EDFF7C3EE0B5@corpsrv2.tazznetworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You don't mention if you have run VACUUM or VACUUM ANALYZE lately.
That's generally one of the first things that folks will suggest. If you
have a lot of updates then VACUUM will clean up dead tuples; if you have
a lot of inserts then VACUUM ANALYZE will update statistics so that the
planner can make better decisions (as I understand it).

Another data point people will ask for in helping you will be EXPLAIN
ANALYZE output from running the queries you think are slowing down.

- DAP

________________________________

From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Ken
Egervari
Sent: Wednesday, January 26, 2005 9:17 PM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] Performance problem with semi-large tables


Hi everyone.

I'm new to this forum and was wondering if anyone would be kind
enough to help me out with a pretty severe performance issue. I believe
the problem to be rather generic, so I'll put it in generic terms.
Since I'm at home and not a work (but this is really bugging me), I
can't post any specifics. However, I think my explaination will
suffice.

I have a 2 tables that are are getting large and will only get
larger with time (expoentially as more users sign on to the system).
Right the now, a table called 'shipment' contains about 16,000 rows and
'shipment_status' contains about 32,500 rows. These aren't massive rows
(I keep reading about tables with millions), but they will definately
get into 6 digits by next year and query performance is quite poor.

Now, from what I can understand about tuning, you want to
specify good filters, provide good indexes on the driving filter as well
as any referencial keys that are used while joining. This has helped me
solve performance problems many times in the past (for example, changing
a query speed from 2 seconds to 21 milliseconds).

However, I am now tuning queries that operate on these two
tables and the filters aren't very good (the best is a filter ratio of
0.125) and the number of rows returned is very large (not taking into
consideration limits).

For example, consider something like this query that takes ~1
second to finish:

select s.*, ss.*
from shipment s, shipment_status ss, release_code r
where s.current_status_id = ss.id
and ss.release_code_id = r.id
and r.filtered_column = '5'
order by ss.date desc
limit 100;

Release code is just a very small table of 8 rows by looking at
the production data, hence the 0.125 filter ratio. However, the data
distribution is not normal since the filtered column actually pulls out
about 54% of the rows in shipment_status when it joins. Postgres seems
to be doing a sequencial scan to pull out all of these rows. Next, it
joins approx 17550 rows to shipment. Since this query has a limit, it
only returns the first 100, which seems like a waste.

Now, for this query, I know I can filter out the date instead to
speed it up. For example, I can probably search for all the shipments
in the last 3 days instead of limiting it to 100. But since this isn't
a real production query, I only wanted to show it as an example since
many times I cannot do a filter by the date (and the sort may be date or
something else irrelavant).

I'm just stressed out how I can make queries like this more
efficient since all I see is a bunch of hash joins and sequencial scans
taking all kinds of time.

I guess here are my 2 questions:

1. Should I just change beg to change the requirements so that I
can make more specific queries and more screens to access those?
2. Can you recommend ways so that postgres acts on big tables
more efficiently? I'm not really interested in this specific case (I
just made it up). I'm more interested in general solutions to this
general problem of big table sizes with bad filters and where join
orders don't seem to help much.

Thank you very much for your help.

Best Regards,
Ken Egervari

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2005-01-29 22:08:58 Re: Performance problem with semi-large tables
Previous Message Josh Berkus 2005-01-29 21:51:10 Re: Performance problem with semi-large tables