Skip site navigation (1) Skip section navigation (2)

Re: PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: Wales Wang <wormwang(at)yahoo(dot)com>, pgsql-performance(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
Date: 2012-02-26 22:34:35
Message-ID: CAMkU=1wG13Ysmy=7_BNZPnaRBX6k7GF94ERyA_6amDXu_7y8dg@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Sun, Feb 26, 2012 at 2:56 AM, Stefan Keller <sfkeller(at)gmail(dot)com> wrote:
> Hi Jeff and Wales,
>
> 2012/2/26 Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>> The problem is that the initial queries are too slow - and there is no
>>> second chance. I do have to trash the buffer every night. There is
>>> enough main memory to hold all table contents.
>>
>> Just that table, or the entire database?
>
> The entire database consisting of only about 5 tables which are
> similar but with different geometry types plus a relations table (as
> OpenStreetMap calls it).

And all of those combined fit in RAM?  With how much to spare?

>
>>> 1. How can I warm up or re-populate shared buffers of Postgres?
>>
>> Instead, warm the OS cache.  Then data will get transferred into the
>> postgres shared_buffers pool from the OS cache very quickly.
>>
>> tar -c $PGDATA/base/ |wc -c
>
> Ok. So with "OS cache" you mean the files which to me are THE database itself?

Most operating systems will use any otherwise unused RAM to cache
"recently" accessed file-system data.  That is the OS cache.  The
purpose of the tar is to populate the OS cache with the "database
itself".  That way, when postgres wants something that isn't already
in shared_buffers, it doesn't require a disk read to get it, just a
request to the OS.

But this trick is most useful after the OS has been restarted so the
OS cache is empty.  If the OS has been up for a long time, then why
isn't it already populated with the data you need?  Maybe the data
doesn't fit, maybe some other process has trashed the cache (in which
case, why would it not continue to trash the cache on an ongoing
basis?)

Since you just recently created the tables and indexes, they must have
passed through the OS cache on the way to disk.  So why aren't they
still there?  Is shared_buffers so large that little RAM is left over
for the OS?  Did you reboot the OS?  Are there other processes running
that drive the database-specific files out of the OS cache?

> A cache to me is a second storage with "controlled redudancy" because
> of performance reasons.

Yeah.  But there are multiple caches, with different parties in
control and different opinions of what is redundant.

>>> 2. Are there any hints on how to tell Postgres to read in all table
>>> contents into memory?
>>
>> I don't think so, at least not in core.  I've wondered if it would
>> make sense to suppress ring-buffer strategy when there are buffers on
>> the free-list.  That way a sequential scan would populate
>> shared_buffers after a restart.  But it wouldn't help you get the
>> indexes into cache.
>
> So, are there any developments going on with PostgreSQL as Stephen
> suggested in the former thread?

I don't see any active development for the upcoming release, and most
of what has been suggested wouldn't help you because they are about
re-populating the cache with previously hot data, while you are
destroying your previously hot data and wanting to specify the
future-hot data.

By the way, your explain plan would be more useful if it included
buffers.  "Explain (analyze, buffers) select..."

I don't know that it is ever better to run analyze without buffers,
other than for backwards compatibility.  I'm trying to get in the
habit of just automatically doing it.

Cheers,

Jeff

In response to

Responses

pgsql-performance by date

Next:From: lephongvuDate: 2012-02-27 04:33:08
Subject: Re: Very long deletion time on a 200 GB database
Previous:From: Tom LaneDate: 2012-02-26 21:00:20
Subject: Re: Index condition in a Nested Loop

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group