Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale
Date: 2014-10-15 13:58:02
Message-ID: 9A28C8860F777E439AA12E8AEA7694F801056956@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > On 10/03/2014 07:08 AM, Kouhei Kaigai wrote:
> > > Hello,
> > >
> > > I recently got a trouble on development of my extension that
> > > utilizes the shared buffer when it released each buffer page.
> > >
> > > This extension transfers contents of the shared buffers to GPU
> > > device using DMA feature, then kicks a device kernel code.
> >
> > Wow, that sounds crazy.
> >
> > > Once backend/extension calls ReadBuffer(), resowner.c tracks which
> > > buffer was referenced by the current resource owner, to ensure these
> > > buffers being released at end of the transaction.
> > > However, it seems to me implementation of resowner.c didn't assume
> > > many buffers are referenced by a particular resource owner
> simultaneously.
> > > It manages the buffer index using an expandable array, then looks up
> > > the target buffer by sequential walk but from the tail because
> > > recently pinned buffer tends to be released first.
> > > It made a trouble in my case. My extension pinned multiple thousands
> > > buffers, so owner->buffers[] were enlarged and takes expensive cost
> > > to walk on.
> > > In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in
> > > total during hash-joining 2M rows; even though hash-joining itself
> > > takes less than 16 seconds.
> > >
> > > What is the best way to solve the problem?
> >
> > How about creating a separate ResourceOwner for these buffer pins, and
> > doing a wholesale ResourceOwnerRelease() on it when you're done?
> >
> Let me clarify your idea.
>
> 1. Create a separate ResourceOwner under the CurrentResourceOwner.
> 2. Switch CurrentResourceOwner to the self-constructed resource owner
> during ReadBuffer() on thousands buffers.
> 3. Switch back to the original CurrentResourceOwner.
> 4. Kick DMA and run GPU device kernel (actual job running...) 5. Switch
> CurrentResourceOwner to the self-constructed resource owner
> again, during ReleaseBuffer() in reverse order.
> 6. Switch back to the original CurrentResourceOwner, then calls
> ResourceOwnerDelete().
>
> Hmm... at this moment, I cannot find something not easy to implement.
> I'd like to try this idea, then report it later.
>
Let me share the result.

The above approach could eliminated my problem. Individual resource-owner
for each set of shared-buffer and ReleaseBuffer() in reverse order reduced
time to release pinned buffers from 36sec->67.3msec when I run hash-joining
on 20M records; that involves 59168 buffers are pinned concurrently in
maximum (32 asynchronous requests, a request packs 1849 buffers).

Thanks for the suggestion!

postgres=# explain analyze select * from t0 natural join t1 natural join t2;
INFO: time to release buffers: 67.289ms
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
Custom (GpuHashJoin) (cost=3468.00..471640.11 rows=19740924 width=139) (actual time=193.086..5913.459 rows=20000000 loops=1)
pseudo scan tlist: 1:(t0.bid), 2:(t0.aid), 3:<t0.id>, 4:<t0.cat>, 5:<t0.cid>, 6:<t0.did>, 7:<t0.x>, 8:<t0.y>, 9:<t0.z>, 11:<t2.btext>, 12:[t2.bid], 10:<t1.atext>, 13:[t1.aid]
hash clause 1: (t0.bid = t2.bid)
hash clause 2: (t0.aid = t1.aid)
-> Custom (GpuScan) on t0 (cost=500.00..467167.24 rows=20000024 width=73) (actual time=7.757..1056.426 rows=20000000 loops=1)
-> Custom (MultiHash) (cost=734.00..734.00 rows=40000 width=37) (actual time=23.382..23.382 rows=40000 loops=1)
hash keys: bid
-> Seq Scan on t2 (cost=0.00..734.00 rows=40000 width=37) (actual time=0.007..5.124 rows=40000 loops=1)
-> Custom (MultiHash) (cost=734.00..734.00 rows=40000 width=37) (actual time=11.919..11.919 rows=40000 loops=1)
hash keys: aid
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37) (actual time=0.010..5.299 rows=40000 loops=1)
Planning time: 0.904 ms
Execution time: 6667.986 ms
(13 rows)

--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-10-15 14:04:52 Re: Locking for Rename To new_name works differently for different objects
Previous Message Stephen Frost 2014-10-15 13:50:21 Re: Buffer Requests Trace