回复: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

From: Wales Wang <wormwang(at)yahoo(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Stefan Keller <sfkeller(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: 回复: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
Date: 2012-02-27 12:46:38
Message-ID: 1330346798.90731.YahooMailNeo@web30808.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

There are many approach for PostgreSQL in-memory.

The quick and easy way is making slave pgsql run on persistent RAM filesystem, the slave is part of master/slave replication cluster.
 
The fstab and script make RAM file system persistent is below:
Setup:
First, create a mountpoint for the disk :
mkdir /mnt/ramdisk
Secondly, add this line to /etc/fstab in to mount the drive at boot-time.
tmpfs           /mnt/ramdisk tmpfs      defaults,size=65536M 0 0

#! /bin/sh
# /etc/init.d/ramdisk.sh
#
 
case "$1" in
  start)
    echo "Copying files to ramdisk"
    rsync -av /data/ramdisk-backup/ /mnt/ramdisk/
    echo [`date +"%Y-%m-%d %H:%M"`] Ramdisk Synched from HD >> /var/log/ramdisk_sync.log
    ;;
  sync)
    echo "Synching files from ramdisk to Harddisk"
    echo [`date +"%Y-%m-%d %H:%M"`] Ramdisk Synched to HD >> /var/log/ramdisk_sync.log
    rsync -av --delete --recursive --force /mnt/ramdisk/ /data/ramdisk-backup/
    ;;
  stop)
    echo "Synching logfiles from ramdisk to Harddisk"
    echo [`date +"%Y-%m-%d %H:%M"`] Ramdisk Synched to HD >> /var/log/ramdisk_sync.log
    rsync -av --delete --recursive --force /mnt/ramdisk/ /data/ramdisk-backup/
    ;;
  *)
    echo "Usage: /etc/init.d/ramdisk {start|stop|sync}"
    exit 1
    ;;
esac
exit 0
 
you can run it when startup and shutdown and crontabe hoursly.
 
Wales Wang

________________________________
发件人: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
收件人: Stefan Keller <sfkeller(at)gmail(dot)com>
抄送: Wales Wang <wormwang(at)yahoo(dot)com>; pgsql-performance(at)postgresql(dot)org; Stephen Frost <sfrost(at)snowman(dot)net>
发送日期: 2012年2月27日, 星期一, 上午 6:34
主题: Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

On Sun, Feb 26, 2012 at 2:56 AM, Stefan Keller <sfkeller(at)gmail(dot)com> wrote:
> Hi Jeff and Wales,
>
> 2012/2/26 Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>> The problem is that the initial queries are too slow - and there is no
>>> second chance. I do have to trash the buffer every night. There is
>>> enough main memory to hold all table contents.
>>
>> Just that table, or the entire database?
>
> The entire database consisting of only about 5 tables which are
> similar but with different geometry types plus a relations table (as
> OpenStreetMap calls it).

And all of those combined fit in RAM?  With how much to spare?

>
>>> 1. How can I warm up or re-populate shared buffers of Postgres?
>>
>> Instead, warm the OS cache. 燭hen data will get transferred into the
>> postgres shared_buffers pool from the OS cache very quickly.
>>
>> tar -c $PGDATA/base/ |wc -c
>
> Ok. So with "OS cache" you mean the files which to me are THE database itself?

Most operating systems will use any otherwise unused RAM to cache
"recently" accessed file-system data.  That is the OS cache.  The
purpose of the tar is to populate the OS cache with the "database
itself".  That way, when postgres wants something that isn't already
in shared_buffers, it doesn't require a disk read to get it, just a
request to the OS.

But this trick is most useful after the OS has been restarted so the
OS cache is empty.  If the OS has been up for a long time, then why
isn't it already populated with the data you need?  Maybe the data
doesn't fit, maybe some other process has trashed the cache (in which
case, why would it not continue to trash the cache on an ongoing
basis?)

Since you just recently created the tables and indexes, they must have
passed through the OS cache on the way to disk.  So why aren't they
still there?  Is shared_buffers so large that little RAM is left over
for the OS?  Did you reboot the OS?  Are there other processes running
that drive the database-specific files out of the OS cache?

> A cache to me is a second storage with "controlled redudancy" because
> of performance reasons.

Yeah.  But there are multiple caches, with different parties in
control and different opinions of what is redundant.

>>> 2. Are there any hints on how to tell Postgres to read in all table
>>> contents into memory?
>>
>> I don't think so, at least not in core. 營've wondered if it would
>> make sense to suppress ring-buffer strategy when there are buffers on
>> the free-list. 燭hat way a sequential scan would populate
>> shared_buffers after a restart. 燘ut it wouldn't help you get the
>> indexes into cache.
>
> So, are there any developments going on with PostgreSQL as Stephen
> suggested in the former thread?

I don't see any active development for the upcoming release, and most
of what has been suggested wouldn't help you because they are about
re-populating the cache with previously hot data, while you are
destroying your previously hot data and wanting to specify the
future-hot data.

By the way, your explain plan would be more useful if it included
buffers.  "Explain (analyze, buffers) select..."

I don't know that it is ever better to run analyze without buffers,
other than for backwards compatibility.  I'm trying to get in the
habit of just automatically doing it.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2012-02-27 14:39:23 Re: Re: [PERFORM] Disable-spinlocks while compiling postgres 9.1 for ARM Cortex A8
Previous Message Reuven M. Lerner 2012-02-27 08:08:15 Re: Very long deletion time on a 200 GB database