Re: How does PG know if data is in memory?

From: Fabrício dos Anjos Silva <fabricio(dot)silva(at)linkcom(dot)com(dot)br>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: How does PG know if data is in memory?
Date: 2010-10-01 11:12:36
Message-ID: AANLkTinrtVbjtadteR88xWGGYovAzhgOWR1=pXLDi08M@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you all for the replies.

If PG does not know whether needed data is in memory, how does it
estimate cost? There is a huge difference between access time in memory and
in secondary storage. Not taking this into account results in almost
"useless" estimates. I am not saying that PG does a pour job, but I've been
using it for 4 years and from time to time I notice very pour estimates.
After some testing 2 years ago, the only configuration I could manage to use
was to tell PG to avoid Seq Scan and Index Scans. I know that in many
situations these techniques are the best to choose, but when they are chosen
where they are not suitable, I get very bad plans.
Recently, I faced poor performance again, but this time because we
started to work with larger tables (10M rows). This encourage me to study PG
tuning again, trying to understand how the planner works and trying to get
the best of it. Unfortunately, it does not seem to be an easy task.
If someone could point good books about PG tuning, I would appreciate
that. I found some yet to be released books about PG 9. Any comments about
them?

Thank you all.

Fabrício dos Anjos Silva
LinkCom Soluções em T.I.

Em 29 de setembro de 2010 14:08, Samuel Gendler
<sgendler(at)ideasculptor(dot)com>escreveu:

>
>
> 2010/9/29 Fabrício dos Anjos Silva <fabricio(dot)silva(at)linkcom(dot)com(dot)br>
>
>
>>
>> When setting seq_page_cost and random_page_cost, do I have to consider
>> the probability that data will be in memory? Or does seq_page_cost mean
>> "sequential access on disk" and random_page_cost mean "random access on
>> disk"?
>>
>>
> The reason seq_page_cost and random_page_cost exist as parameters is so
> that you can inform the optimizer what the relative costs of those actions
> are, which is directly related to the expected size of the filesystem cache,
> ratio of total db size to available cache memory, and the performance of
> your disk i/o subsystems (and any other disk-related work the host may be
> doing). effective_cache_size allows you to tell postgres how big you
> believe all available cache memory is - shared_buffers and OS cache.
>
> As to your question about increasing shared_buffers to be some significant
> proportion of available RAM - apparently, that is not a good idea. I've
> seen advice that said you shouldn't go above 8GB for shared_buffers and I've
> also seen 12GB suggested as an upper limit, too. On my host with 48GB of
> RAM, I didn't see much difference between 8GB and 12GB on a fairly wide
> variety of tests, so mine is set at 8GB with an efective_cache_size of 36GB.
>
>
>> I appreciate if someone could clear this out.
>>
>> Thanks!
>>
>> Fabrício dos Anjos Silva
>> LinkCom Soluções em T.I.
>>
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2010-10-01 12:24:21 Re: How does PG know if data is in memory?
Previous Message Jesper Krogh 2010-10-01 05:56:18 Re: gist indexes for distance calculations