Skip site navigation (1) Skip section navigation (2)

Re: Hash join on int takes 8..114 seconds

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Andrus <kobruleht2(at)hot(dot)ee>, PFC <lists(at)peufeu(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hash join on int takes 8..114 seconds
Date: 2008-11-23 21:30:40
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
> Appoaches which probably does not change perfomance:

> 6. Upgrade to 8.4 or to 8.3.5

Both of these will improve performance a little, even with the same query plan and same data.  I would expect about a 10% improvement for 8.3.x on most memory bound select queries. 8.4 won't be out for a few months.

> 7. run server on standalone mode and recover 1 GB pg_shdepend bloated index.

> 8. tune some conf file parameters:
> > work_mem = 512
> I'd consider increasing this value a little - 0.5 MB seems too low to me
> (but not necessarily).

This is very easy to try.  You can change work_mem for just a single session, and this can in some cases help performance quite a bit, and in others not at all.
I would not recommend having it lower than at least 4MB on a server like that unless you have a lot of concurrently active queries / connections.
To try it, simply use the SET command.  To try out 32MB, just do:
SET work_mem = '32MB';
and the value will be changed locally for that session only.  See if it affects your test query or not.

> > effective_cache_size= 70000
> Well, your server has 2GB of RAM and usually it's recommended to set
> this value to about 60-70% of your RAM, so using 540MB (25%) seems quite
> low.

> Data size is nearly the same as RAM size. It is unpleasant surprise that
> queries take so long time.

> What should I do next?

First, demonstrate that it is all or mostly in memory -- use iostat or other tools to ensure that there is not much disk activity during the query.  If your system doesn't have iostat installed, it should be installed.  It is a very useful tool.
If it is all cached in memory, you may want to ensure that your shared_buffers is a reasonalbe size so that there is less shuffling of data from the kernel to postgres and back.  Generally, shared_buffers works best between 5% and 25% of system memory.
If it is completely CPU bound then the work done for the query has to be reduced by altering the plan to a more optimal one or making the work it has to do at each step easier.  Most of the ideas in this thread revolve around those things.

Based on the time it took to do the vacuum, I suspect your disk subsystem is a bit slow.  If it can be determined that there is much disk I/O in your use cases, there are generally several things that can be done to tune Linux I/O.  The main ones in my experience are the 'readahead' value for each disk which helps sequential reads significantly, and trying out the linux 'deadline' scheduler and comparing it to the more commonly used 'cfq' scheduler.  If the system is configured with the anticipatory scheduler, absolutely switch to cfq or deadline as the anticipatory scheduler will perform horribly poorly for a database.

> Andrus.

In response to


pgsql-performance by date

Next:From: AndrusDate: 2008-11-23 21:35:41
Subject: Re: Hash join on int takes 8..114 seconds
Previous:From: AndrusDate: 2008-11-23 20:52:54
Subject: limit clause produces wrong query plan

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group