Re: checkpointer continuous flushing

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-06-23 04:15:33
Message-ID: CAA4eK1KV_ts-CBbTtSeDnc5OPXgXM9C0AyLuXGZ+eRyw=LTevA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 22, 2015 at 1:41 PM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
>
> <sorry, resent stalled post, wrong from>
>
>> It'd be interesting to see numbers for tiny, without the overly small
>> checkpoint timeout value. 30s is below the OS's writeback time.
>
>
> Here are some tests with longer timeout:
>
> tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min
> max_wal_size=1GB warmup=600 time=4000
>
> flsh | full speed tps | percent of late tx, 4 clients, for
tps:
> /srt | 1 client | 4 clients | 100 | 200 | 400 | 800 | 1200 |
1600
> N/N | 930 +- 124 | 2560 +- 394 | 0.70 | 1.03 | 1.27 | 1.56 | 2.02 |
2.38
> N/Y | 924 +- 122 | 2612 +- 326 | 0.63 | 0.79 | 0.94 | 1.15 | 1.45 |
1.67
> Y/N | 907 +- 112 | 2590 +- 315 | 0.58 | 0.83 | 0.68 | 0.71 | 0.81 |
1.26
> Y/Y | 915 +- 114 | 2590 +- 317 | 0.60 | 0.68 | 0.70 | 0.78 | 0.88 |
1.13
>
> There seems to be a small 1-2% performance benefit with 4 clients, this
is reversed for 1 client, there are significantly and consistently less
late transactions when options are activated, the performance is more stable
> (standard deviation reduced by 10-18%).
>
> The db is about 200 MB ~ 25000 pages, at 2500+ tps it is written 40 times
over in 5 minutes, so the checkpoint basically writes everything in 220
seconds, 0.9 MB/s. Given the preload phase the buffers may be more or less
in order in memory, so may be written out in order anyway.
>
>
> medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
> max_wal_size=4GB warmup=1200 time=7500
>
> flsh | full speed tps | percent of late tx, 4 clients
> /srt | 1 client | 4 clients | 100 | 200 | 400 |
> N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
> N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 |
> Y/N | 169 +- 166* | 187 +- 302* | 4.01 | 39.84 | 65.70 |
> Y/Y | 546 +- 143 | 681 +- 459 | 1.55 | 3.51 | 2.84 |
>
> The effect of sorting is very positive (+150% to 270% tps). On this run,
flushing has a positive (+20% with 1 client) or negative (-8 % with 4
clients) on throughput, and late transactions are reduced by 92-95% when
both options are activated.
>

Why there is dip in performance with multiple clients, can it be
due to reason that we started doing more stuff after holding bufhdr
lock in below code?

BufferSync()
{
..
for (buf_id = 0; buf_id < NBuffers; buf_id++)
{
volatile BufferDesc *bufHdr = GetBufferDescriptor(buf_id);
@@ -1621,32 +1719,185 @@ BufferSync(int flags)

if ((bufHdr->flags & mask) == mask)
{
+ Oid spc;
+ TableSpaceCountEntry * entry;
+ bool found;
+
bufHdr->flags |= BM_CHECKPOINT_NEEDED;
+ CheckpointBufferIds[num_to_write] = buf_id;
num_to_write++;
+
+ /* keep track of per tablespace buffers */
+ spc = bufHdr->tag.rnode.spcNode;
+ entry = (TableSpaceCountEntry *)
+ hash_search(spcBuffers, (void *) &spc, HASH_ENTER, &found);
+
+ if (found) entry->count++;
+ else entry->count = 1;
}
..
}

-
BufferSync()
{
..
- buf_id = StrategySyncStart(NULL, NULL);
- num_to_scan = NBuffers;
+ active_spaces = nb_spaces;
+ space = 0;
num_written = 0;
- while (num_to_scan-- > 0)
+
+ while (active_spaces != 0)
..
}

The changed code doesn't seems to give any consideration to
clock-sweep point which might not be helpful for cases when checkpoint
could have flushed soon-to-be-recycled buffers. I think flushing the
sorted buffers w.r.t tablespaces is a good idea, but not giving any
preference to clock-sweep point seems to me that we would loose in
some cases by this new change.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-06-23 04:39:55 A couple of newlines missing in pg_rewind log entries
Previous Message Tomas Vondra 2015-06-23 03:45:08 Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H