First and foremost, thanks to the great number of people who have responded
with various tips and suggestions. I am now starting to fully appreciate
the various cache settings and what they can do for performance.
I just want to redefine the problem based on the knowledge of it that I now
In my example, I am purposefully forcing a full table scan - select count
(*) from table. This table has only 90,000 rows. Each row is comprised of
about 300 bytes of data.
If the table has not been cached, I was seeing response times from 5 to 18
seconds to read the table. If it had been cached, then the response time
dropped to sub-second response.
Obviously, I can tune the caching so as to make sure that as much data that
can be reasonably cached is cached. However, I don't think that a hit of
even 5 seconds to read a table of 90,000 rows is acceptable.
One thing that has been tried with some success was to dump the table and
recreate it. After this exercise, selecting all rows from the table when it
is not in cache takes about 3 seconds. (Of course, when in cache, the same
sub-second response time is seen.)
I still think that 3 seconds is not acceptable. However, I reserve the
right to be wrong. Does it sound unrealistic to expect PostgreSQL to be
able to read 90,000 rows with 300 bytes per row in under a second?
Based on suggestions from the list, I am also thinking of making the
following tuning changes:
shared_buffers = 15000
sort_mem = 32168
effective_cache_size = 64000
This is based on one gig of memory.
Does anyone have any feedback on these values? Also, realizing that no two
database are the same, etc., etc... does anyone have a good formula for
setting these values?
Thanks in advance,
At 09:09 AM 9/16/2003 +0200, Manfred Koizar wrote:
>On Mon, 15 Sep 2003 22:26:45 -0400, Christopher Browne
> >> select count (*) from table;
> >The only possible plan for THAT query will involve a seq scan of the
> >whole table. If the postmaster already has the data in cache, it
> >makes sense for it to run in 1 second. If it has to read it from
> >disk, 12 seconds makes a lot of sense.
>Yes. And note that the main difference is between having the data in
>memory and having to fetch it from disk. I don't believe that this
>difference can be explained by 9000 read calls hitting the operating
> >You might want to increase the "shared_buffers" parameter in
> >postgresql.conf; that should lead to increased stability of times as
> >it should be more likely that the data in "table" will remain in
>Let's not jump to this conclusion before we know what's going on.
>Joseph Bove <jbove(at)vetstar(dot)com> wrote in another message above:
>| I did have shared_buffers and sort_mem both set higher originally (15000,
>As I read this I think he meant "... and had the same performance
>Joseph, what do you get, if you run that
> EXPLAIN ANALYSE SELECT count(*) ...
>several times? What do vmstat and top show while the query is
>running? Are there other processes active during or between the runs?
>What kind of processes? Postgres backends? Web server? ...
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
In response to
pgsql-performance by date
|Next:||From: Jeff||Date: 2003-09-16 15:38:12|
|Subject: Re: Inconsistent performance|
|Previous:||From: Tom Lane||Date: 2003-09-16 14:05:06|
|Subject: Re: Attempt at work around of int4 query won't touch int8 index ... |