Re: [Patch] Optimize dropping of relation buffers using dlist

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "tomas(dot)vondra(at)2ndquadrant(dot)com" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-11-10 03:03:26
Message-ID: CAA4eK1KwQGATfnY5HBGgirM2h4L7n3eEKo25HzKH5wiD9a2uAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 10, 2020 at 8:19 AM k(dot)jamison(at)fujitsu(dot)com
<k(dot)jamison(at)fujitsu(dot)com> wrote:
>
> > From: k(dot)jamison(at)fujitsu(dot)com <k(dot)jamison(at)fujitsu(dot)com>
> > On Thursday, October 22, 2020 3:15 PM, Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > I'm not sure about the exact steps of the test, but it can be expected
> > > if we have many small relations to truncate.
> > >
> > > Currently BUF_DROP_FULL_SCAN_THRESHOLD is set to Nbuffers / 512,
> > which
> > > is quite arbitrary that comes from a wild guess.
> > >
> > > Perhaps we need to run benchmarks that drops one relation with several
> > > different ratios between the number of buffers to-be-dropped and
> > > Nbuffers, and preferably both on spinning rust and SSD.
> >
> > Sorry to get back to you on this just now.
> > Since we're prioritizing the vacuum patch, we also need to finalize which
> > threshold value to use.
> > I proceeded testing with my latest set of patches because Amit-san's
> > comments on the code, the ones we addressed, don't really affect the
> > performance. I'll post the updated patches for 0002 & 0003 after we come up
> > with the proper boolean parameter name for smgrnblocks and the buffer full
> > scan threshold value.
> >
> > Test the VACUUM performance with the following thresholds:
> > NBuffers/512, NBuffers/256, NBuffers/128, and determine which of the
> > ratio has the best performance in terms of speed.
> >
> > I tested this on my machine (CPU 4v, 8GB memory, ext4) running on SSD.
> > Configure streaming replication environment.
> > shared_buffers = 100GB
> > autovacuum = off
> > full_page_writes = off
> > checkpoint_timeout = 30min
> >
> > [Steps]
> > 1. Create TABLE
> > 2. INSERT data
> > 3. DELETE from TABLE
> > 4. Pause WAL replay on standby
> > 5. VACUUM. Stop the primary.
> > 6. Resume WAL replay and promote standby.
> >
> > With 1 relation, there were no significant changes that we can observe:
> > (In seconds)
> > | s_b | Master | NBuffers/512 | NBuffers/256 | NBuffers/128 |
> > |-------|--------|--------------|--------------|--------------|
> > | 128MB | 0.106 | 0.105 | 0.105 | 0.105 |
> > | 100GB | 0.106 | 0.105 | 0.105 | 0.105 |
> >
> > So I tested with 100 tables and got more convincing measurements:
> >
> > | s_b | Master | NBuffers/512 | NBuffers/256 | NBuffers/128 |
> > |-------|--------|--------------|--------------|--------------|
> > | 128MB | 1.006 | 1.007 | 1.006 | 0.107 |
> > | 1GB | 0.706 | 0.606 | 0.606 | 0.605 |
> > | 20GB | 1.907 | 0.606 | 0.606 | 0.605 |
> > | 100GB | 7.013 | 0.706 | 0.606 | 0.607 |
> >
> > The threshold NBuffers/128 has the best performance for default
> > shared_buffers (128MB) with 0.107 s, and equally performing with large
> > shared_buffers up to 100GB.
> >
> > We can use NBuffers/128 for the threshold, although I don't have a
> > measurement for HDD yet.
> > However, I wonder if the above method would suffice to determine the final
> > threshold that we can use. If anyone has suggestions on how we can come
> > up with the final value, like if I need to modify some steps above, I'd
> > appreciate it.
> >
> > Regarding the parameter name. Instead of accurate, we can use "cached" as
> > originally intended from the early versions of the patch since it is the smgr
> > that handles smgrnblocks to get the the block size of smgr_cached_nblocks..
> > "accurate" may confuse us because the cached value may not be actually
> > accurate..
>
> Hi,
>
> So I proceeded to update the patches using the "cached" parameter and updated
> the corresponding comments to it in 0002.
>
> I've addressed the suggestions and comments of Amit-san on 0003:
> 1. For readability, I moved the code block to a new static function FindAndDropRelFileNodeBuffers()
> 2. Initialize the bool cached with false.
> 3. It's also decided that we don't need the extra pre-checking of RelFileNode
> when locking the bufhdr in FindAndDropRelFileNodeBuffers
>
> I repeated the recovery performance test for vacuum. (I made a mistake previously in NBuffers/128)
> The 3 kinds of thresholds are almost equally performant. I chose NBuffers/256 for this patch.
>
> | s_b | Master | NBuffers/512 | NBuffers/256 | NBuffers/128 |
> |-------|--------|--------------|--------------|--------------|
> | 128MB | 1.006 | 1.007 | 1.007 | 1.007 |
> | 1GB | 0.706 | 0.606 | 0.606 | 0.606 |
> | 20GB | 1.907 | 0.606 | 0.606 | 0.606 |
> | 100GB | 7.013 | 0.706 | 0.606 | 0.606 |
>

I think this data is not very clear. What is the unit of time? What is
the size of the relation used for the test? Did the test use an
optimized path for all cases? If at 128MB, there is no performance
gain, can we consider the size of shared buffers as 256MB as well for
the threshold?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-11-10 03:26:54 Re: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Tom Lane 2020-11-10 03:02:27 Re: remove spurious CREATE INDEX CONCURRENTLY wait