On Oct 30, 2007, at 9:23 AM, Richard Huxton wrote:
> Ketema wrote:
>> I have the following query that is a part of a function:
> Yikes! Difficult to get a clear view of what this query is doing.
It seems complicated because you only have a small subset of the
schema these tables tie into.
Be happy to share the whole thing, if it is needed.
> OK, I'm assuming you're vacuumed and analysed on all these tables...
Yes. Auto-vacuum is on and do a Full vacuuum every 2 days.
>> My concern is with the sort step that takes 15 seconds by itself:
>> -> Sort (cost=1235567017.53..1238002161.29 rows=974057502
>> (actual time=16576.997..16577.513 rows=3366 loops=1)
> That's taking hardly any time, the startup time is 16576.997
> already. Of course, the row estimate is *way* out of line.
OK. I misread the plan and took start up time as the time it took to
perform operation. Thanks for the link to explain analyze.
> If you look here (where the explain is a bit easier to see)
> The two main things to look at seem to be the nested loops near the
> top and a few lines down the materialise (cost=326...
> These two nested loops seem to be pushing the row estimates wildly
> out of reality. They also consume much of the time.
> The immediate thing that leaps out here is that you are trying to
> join an int to an array of ints. Why are you using this setup
> rather than a separate table?
I see what you are talking about. When I initially used this set up
it was because I wanted to avoid a table that had a ton of rows in it
that I knew I would have to join to often. So I made a column that
holds on average 4 or 5 ints representing "products" on a particular
"order". I did not realize that using a function in the join would be
worse that simply having a large table.
>> How can I improve this step?
>> Things I have thought about:
>> 1)Creating indexes on the aggregates...Found out this can't be done.
> Nope - not sure what it would mean in any case.
My initial thought was the counts were causing the slow up. THis is
not the issue though as you have shown.
>> 2)Create Views of the counts and the sub select...is this any faster
>> as the view is executed at run time anyway?
> Might make the query easier to write, won't make it faster. Not
> without materialised views which are the fancy name for #3...
>> 3)Create actual tables of the sub select and aggregates...How would
>> this be maintained to ensure it was always accurate?
Because of the use of this system I may take this route as I think it
will be less changes.
>> 4)Increasing hardware resources. Currently box is on a single
>> processor amd64 with 8Gb of RAM. below are the settings for resource
>> shared_buffers = 65536
>> temp_buffers = 5000
>> max_prepared_transactions = 2000
These are settings out of postgresql.conf Currently systctl.conf is
set to kernel.shmmax = 805306368
connections are at 300 and I usually have about 200 connections open.
>> work_mem = 131072
>> maintenance_work_mem = 512000
> Can't say about these without knowing whether you've got only one
> connection or 100.
>> max_stack_depth = 7168
>> max_fsm_pages = 160000
>> max_fsm_relations = 4000
>> The only function of this box if for Pg, so I do not mind it using
>> every last drop of ram and resources that it can.
>> 5)Upgrade version of pg..currently is running 8.1.4
> Well every version gets better at planning, so it can't hurt.
At one point I did go to 8.2.3 on a dev box and performance was
horrible. Have not had opportunity to see how to make
postgresql.conf file in 8.2 match settings in 8.1 as some things have
> Richard Huxton
> Archonet Ltd
In response to
pgsql-performance by date
|Next:||From: Ketema Harris||Date: 2007-10-30 14:15:34|
|Subject: Re: Improving Query|
|Previous:||From: Michael Glaesemann||Date: 2007-10-30 13:31:46|
|Subject: Re: Improving Query|