Re: Speed up Clog Access by increasing CLOG buffers

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-09-17 01:24:33
Message-ID: 3bb2699f-fe51-7419-b42b-9ad5bfd0d506@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/14/2016 06:04 PM, Dilip Kumar wrote:
> On Wed, Sep 14, 2016 at 8:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Sure, but you're testing at *really* high client counts here. Almost
>> nobody is going to benefit from a 5% improvement at 256 clients.
>
> I agree with your point, but here we need to consider one more thing,
> that on head we are gaining ~30% with both the approaches.
>
> So for comparing these two patches we can consider..
>
> A. Other workloads (one can be as below)
> -> Load on CLogControlLock at commit (exclusive mode) + Load on
> CLogControlLock at Transaction status (shared mode).
> I think we can mix (savepoint + updates)
>
> B. Simplicity of the patch (if both are performing almost equal in all
> practical scenarios).
>
> C. Bases on algorithm whichever seems winner.
>
> I will try to test these patches with other workloads...
>
>> You
>> need to test 64 clients and 32 clients and 16 clients and 8 clients
>> and see what happens there. Those cases are a lot more likely than
>> these stratospheric client counts.
>
> I tested with 64 clients as well..
> 1. On head we are gaining ~15% with both the patches.
> 2. But group lock vs granular lock is almost same.
>

I've been doing some testing too, but I haven't managed to measure any
significant difference between master and any of the patches. Not sure
why, I've repeated the test from scratch to make sure I haven't done
anything stupid, but I got the same results (which is one of the main
reasons why the testing took me so long).

Attached is an archive with a script running the benchmark (including
SQL scripts generating the data and custom transaction for pgbench), and
results in a CSV format.

The benchmark is fairly simple - for each case (master + 3 different
patches) we do 10 runs, 5 minutes each, for 32, 64, 128 and 192 clients
(the machine has 32 physical cores).

The transaction is using a single unlogged table initialized like this:

create unlogged table t(id int, val int);
insert into t select i, i from generate_series(1,100000) s(i);
vacuum t;
create index on t(id);

(I've also ran it with 100M rows, called "large" in the results), and
pgbench is running this transaction:

\set id random(1, 100000)

BEGIN;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s1;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s2;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s3;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s4;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s5;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s6;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s7;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s8;
COMMIT;

So 8 simple UPDATEs interleaved by savepoints. The benchmark was running
on a machine with 256GB of RAM, 32 cores (4x E5-4620) and a fairly large
SSD array. I'd done some basic tuning on the system, most importantly:

effective_io_concurrency = 32
work_mem = 512MB
maintenance_work_mem = 512MB
max_connections = 300
checkpoint_completion_target = 0.9
checkpoint_timeout = 3600
max_wal_size = 128GB
min_wal_size = 16GB
shared_buffers = 16GB

Although most of the changes probably does not matter much for unlogged
tables (I planned to see how this affects regular tables, but as I see
no difference for unlogged ones, I haven't done that yet).

So the question is why Dilip sees +30% improvement, while my results are
almost exactly the same. Looking at Dilip's benchmark, I see he only ran
the test for 10 seconds, and I'm not sure how many runs he did, warmup
etc. Dilip, can you provide additional info?

I'll ask someone else to redo the benchmark after the weekend to make
sure it's not actually some stupid mistake of mine.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
clog.tgz application/x-compressed-tar 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-09-17 03:23:38 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message Michael Paquier 2016-09-16 23:17:02 Re: WIP: About CMake v2