Re: Proposal : For Auto-Prewarm.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : For Auto-Prewarm.
Date: 2017-05-30 13:10:16
Message-ID: CAA4eK1JPHzz2-jTtb3KSVJRJW7XM7sjvCGT3TiLYq6FavRVFuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 30, 2017 at 12:36 PM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> On 27.10.2016 14:39, Mithun Cy wrote:
>
>
> I wonder if you considered parallel prewarming of a table?
> Right now either with pg_prewarm, either with pg_autoprewarm, preloading
> table's data is performed by one backend.
> It certainly makes sense if there is just one HDD and we want to minimize
> impact of pg_prewarm on normal DBMS activity.
> But sometimes we need to load data in memory as soon as possible. And modern
> systems has larger number of CPU cores and
> RAID devices make it possible to efficiently load data in parallel.
>
> I have asked this question in context of my CFS (compressed file system) for
> Postgres. The customer's complaint was that there are 64 cores at his system
> but when
> he is building index, decompression of heap data is performed by only one
> core. This is why I thought about prewarm... (parallel index construction is
> separate story...)
>
> pg_prewarm makes is possible to specify range of blocks, so, in principle,
> it is possible to manually preload table in parallel, by spawining
> pg_prewarm
> with different subranges in several backends. But it is definitely not user
> friendly approach.
> And as far as I understand pg_autoprewarm has all necessary infrastructure
> to do parallel load. We just need to spawn more than one background worker
> and specify
> separate block range for each worker.
>
> Do you think that such functionality (parallel autoprewarm) can be useful
> and be easily added?
>

I think parallel load functionality can be useful for few cases like
when the system has multiple I/O channels. I think doing it
parallelly might need some additional infrastructure to manage the
workers based on how we decide to parallelism like whether we allow
each worker to pick one block and load the same or specify the range
of blocks for each worker. Each way has its own pros and cons. It
seems like even if we want to add such an option to *prewarm
functionality, it should be added as a separate patch as it has its
own set of problems that needs to be solved.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-05-30 13:41:57 Re: [COMMITTERS] Re: pgsql: Code review focused on new node types added by partitioning supp
Previous Message Alexander Korotkov 2017-05-30 13:02:00 Re: GSoC 2017: Foreign Key Arrays