Re: Possible explanations for catastrophic performance deterioration?

From: Carlos Moreno <moreno_pg(at)mochima(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Possible explanations for catastrophic performance deterioration?
Date: 2007-09-23 20:33:30
Message-ID: 46F6CD9A.2090600@mochima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jonah H. Harris wrote:
> On 9/23/07, Carlos Moreno <moreno_pg(at)mochima(dot)com> wrote:
>> Wait a second --- am I correct in understanding then that the bloating
>> you guys are referring to occurs *in memory*??
>
> No, bloating occurs on-disk; but this does affect memory. Bloat means
> that even though your table data may take up 1G after the initial
> load, due to poor vacuuming, table layouts, etc. it to equal something
> more... say 2G.
>
> The thing is, even though the table only stores 1G of data, it is now
> physically 2G. So, anything that would need to read the entire table
> (like COUNT(*)), or large sections of it sequentially, are performing
> twice as many I/Os to do so.

OK --- that was my initial impression... But again, then I'm still puzzled
about why *the second time* that I load the query, it still take a few
seconds.

That is: the first time I run the query, it has to go through the disk;
in the normal case it would have to read 100MB of data, but due to
bloating,
it actually has to go through 2GB of data. Ok, but then, it will load
only 100MB (the ones that are not "uncollected disk garbage") to memory.
The next time that I run the query, the server would only need to read
100MB from memory --- the result should be instantaneous...

The behaviour I observed was: first time I run the query took over one
minute; second time, a little above two seconds. Tried four or five times
more; in every instance it was around 2 seconds. On the new server, *the
first time* I run the query, it takes *no time* (I repeat: *no time*
--- as
in perhaps 10 to 100 msec; in any case, my eyes could not resolve between
the moment I hit enter and the moment I see the result with the count of
rows --- that's between one and two orders of magnitude faster than with
the
old server --- and again, we're comparing *the first* time I execute the
query
on the new machine, in which case it is expected that it would have to read
from disk, compared to the second and subsequent times that I execute it on
the old machine, in which case, since the bloating does not occur in
memory,
the entire seq. scan should occur exclusively in memory ... )

That's what still puzzles me --- Alvaro's reply seemed to explain it if I
accept that the bloating affects memory (dead tuples loaded to memory
reduce
the capacity to load the entire dataset into memory)...

Someone could shed some light and point out if there's still something I'm
missing or some other mistake in my analysis?? Hope I'm not sounding like
I'm being dense!!

Thanks,

Carlos
--

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2007-09-23 20:57:59 Re: Possible explanations for catastrophic performance deterioration?
Previous Message Jonah H. Harris 2007-09-23 19:24:10 Re: Possible explanations for catastrophic performace deterioration?