Re: Proposal : For Auto-Prewarm.

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal : For Auto-Prewarm.
Date: 2017-05-30 07:06:15
Message-ID: 22a2c53d-7e7b-79c7-783a-30b232b3d001@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 27.10.2016 14:39, Mithun Cy wrote:
> # pg_autoprewarm.
>
> This a PostgreSQL contrib module which automatically dump all of the
> blocknums
> present in buffer pool at the time of server shutdown(smart and fast
> mode only,
> to be enhanced to dump at regular interval.) and load these blocks
> when server restarts.
>
> Design:
> ------
> We have created a BG Worker Auto Pre-warmer which during shutdown
> dumps all the
> blocknum in buffer pool in sorted order.
> Format of each entry is
> <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>.
> Auto Pre-warmer is started as soon as the postmaster is started we do
> not wait
> for recovery to finish and database to reach a consistent state. If
> there is a
> "dump_file" to load we start loading each block entry to buffer pool until
> there is a free buffer. This way we do not replace any new blocks
> which was
> loaded either by recovery process or querying clients. Then it waits
> until it receives
> SIGTERM to dump the block information in buffer pool.
>
> HOW TO USE:
> -----------
> Build and add the pg_autoprewarm to shared_preload_libraries. Auto
> Pre-warmer
> process automatically do dumping of buffer pool's block info and load
> them when
> restarted.
>
> TO DO:
> ------
> Add functionality to dump based on timer at regular interval.
> And some cleanups.

I wonder if you considered parallel prewarming of a table?
Right now either with pg_prewarm, either with pg_autoprewarm, preloading
table's data is performed by one backend.
It certainly makes sense if there is just one HDD and we want to
minimize impact of pg_prewarm on normal DBMS activity.
But sometimes we need to load data in memory as soon as possible. And
modern systems has larger number of CPU cores and
RAID devices make it possible to efficiently load data in parallel.

I have asked this question in context of my CFS (compressed file system)
for Postgres. The customer's complaint was that there are 64 cores at
his system but when
he is building index, decompression of heap data is performed by only
one core. This is why I thought about prewarm... (parallel index
construction is separate story...)

pg_prewarm makes is possible to specify range of blocks, so, in
principle, it is possible to manually preload table in parallel, by
spawining pg_prewarm
with different subranges in several backends. But it is definitely not
user friendly approach.
And as far as I understand pg_autoprewarm has all necessary
infrastructure to do parallel load. We just need to spawn more than one
background worker and specify
separate block range for each worker.

Do you think that such functionality (parallel autoprewarm) can be
useful and be easily added?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Ladhe 2017-05-30 07:38:47 Re: Adding support for Default partition in partitioning
Previous Message Dilip Kumar 2017-05-30 06:45:31 Re: POC: Sharing record typmods between backends