Re: Speed up Clog Access by increasing CLOG buffers

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2015-09-03 11:41:37
Message-ID: 20150903114137.GE27649@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-09-01 10:19:19 +0530, Amit Kapila wrote:
> pgbench setup
> ------------------------
> scale factor - 300
> Data is on magnetic disk and WAL on ssd.
> pgbench -M prepared tpc-b
>
> HEAD - commit 0e141c0f
> Patch-1 - increase_clog_bufs_v1
>
> Client Count/Patch_ver 1 8 16 32 64 128 256 HEAD 911 5695 9886 18028 27851
> 28654 25714 Patch-1 954 5568 9898 18450 29313 31108 28213
>
>
> This data shows that there is an increase of ~5% at 64 client-count
> and 8~10% at more higher clients without degradation at lower client-
> count. In above data, there is some fluctuation seen at 8-client-count,
> but I attribute that to run-to-run variation, however if anybody has doubts
> I can again re-verify the data at lower client counts.

> Now if we try to further increase the number of CLOG buffers to 128,
> no improvement is seen.
>
> I have also verified that this improvement can be seen only after the
> contention around ProcArrayLock is reduced. Below is the data with
> Commit before the ProcArrayLock reduction patch. Setup and test
> is same as mentioned for previous test.

The buffer replacement algorithm for clog is rather stupid - I do wonder
where the cutoff is that it hurts.

Could you perhaps try to create a testcase where xids are accessed that
are so far apart on average that they're unlikely to be in memory? And
then test that across a number of client counts?

There's two reasons that I'd like to see that: First I'd like to avoid
regression, second I'd like to avoid having to bump the maximum number
of buffers by small buffers after every hardware generation...

> /*
> * Number of shared CLOG buffers.
> *
> - * Testing during the PostgreSQL 9.2 development cycle revealed that on a
> + * Testing during the PostgreSQL 9.6 development cycle revealed that on a
> * large multi-processor system, it was possible to have more CLOG page
> - * requests in flight at one time than the number of CLOG buffers which existed
> - * at that time, which was hardcoded to 8. Further testing revealed that
> - * performance dropped off with more than 32 CLOG buffers, possibly because
> - * the linear buffer search algorithm doesn't scale well.
> + * requests in flight at one time than the number of CLOG buffers which
> + * existed at that time, which was 32 assuming there are enough shared_buffers.
> + * Further testing revealed that either performance stayed same or dropped off
> + * with more than 64 CLOG buffers, possibly because the linear buffer search
> + * algorithm doesn't scale well or some other locking bottlenecks in the
> + * system mask the improvement.
> *
> - * Unconditionally increasing the number of CLOG buffers to 32 did not seem
> + * Unconditionally increasing the number of CLOG buffers to 64 did not seem
> * like a good idea, because it would increase the minimum amount of shared
> * memory required to start, which could be a problem for people running very
> * small configurations. The following formula seems to represent a reasonable
> * compromise: people with very low values for shared_buffers will get fewer
> - * CLOG buffers as well, and everyone else will get 32.
> + * CLOG buffers as well, and everyone else will get 64.
> *
> * It is likely that some further work will be needed here in future releases;
> * for example, on a 64-core server, the maximum number of CLOG requests that
> * can be simultaneously in flight will be even larger. But that will
> * apparently require more than just changing the formula, so for now we take
> - * the easy way out.
> + * the easy way out. It could also happen that after removing other locking
> + * bottlenecks, further increase in CLOG buffers can help, but that's not the
> + * case now.
> */

I think the comment should be more drastically rephrased to not
reference individual versions and numbers.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Atsushi Yoshida 2015-09-03 12:14:22 Too many duplicated condition query return wrong value
Previous Message Fujii Masao 2015-09-03 11:29:33 Re: GIN pending clean up is not interruptable