Re: Possible explanations for catastrophic performance deterioration?

From: Carlos Moreno <moreno_pg(at)mochima(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Possible explanations for catastrophic performance deterioration?
Date: 2007-09-23 21:55:49
Message-ID: 46F6E0E5.3080208@mochima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Alvaro Herrera wrote:
> Carlos Moreno wrote:
>
>
>> That is: the first time I run the query, it has to go through the
>> disk; in the normal case it would have to read 100MB of data, but due
>> to bloating, it actually has to go through 2GB of data. Ok, but
>> then, it will load only 100MB (the ones that are not "uncollected
>> disk garbage") to memory. The next time that I run the query, the
>> server would only need to read 100MB from memory --- the result should
>> be instantaneous...
>
> Wrong. If there is 2GB of data, 1900MB of which is dead tuples, those
> pages would still have to be scanned for the count(*). The system does
> not distinguish "pages which have no live tuples" from other pages, so
> it has to load them all.

Yes, that part I understand --- I think I now know what the error is in
my logic. I was thinking as follows: We read 2GB of which 1900MB are
dead tuples. But then, once they're read, the system will only keep
in memory the 100MB that are valid tuples.

I'm now thinking that the problem with my logic is that the system does
not keep anything in memory (or not all tuples, in any case), since it
is only counting, so it does not *have to* keep them, and since the
total amount of reading from the disk exceeds the amount of physical
memory, then the valid tuples are "pushed out" of memory.

So, the second time I execute the query, it will still need to scan the
disk (in my mind, the way I was seeing it, the second time I execute
the "select count(*) from customer", the entire customer table would be
in memory from the previous time, and that's why I was thinking that
the bloating would not explain why the second time it is still slow).

Am I understanding it right?

Thanks for your patience!

Carlos
--

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jonah H. Harris 2007-09-23 22:29:02 Re: Possible explanations for catastrophic performance deterioration?
Previous Message Alvaro Herrera 2007-09-23 20:57:59 Re: Possible explanations for catastrophic performance deterioration?