Re: Handing off SLRU fsyncs to the checkpointer

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handing off SLRU fsyncs to the checkpointer
Date: 2020-08-05 06:00:17
Message-ID: CA+hUKG+WPWgLyO59+ADoDQ9ar0mpiH3jFYqJXSKpvx6igA16Pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 4, 2020 at 6:02 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> ... speedup of around 6% ...

I did some better testing. OS: Linux, storage: consumer SSD. I
repeatedly ran crash recovery on 3.3GB worth of WAL generated with 8M
pgbench transactions. I tested 3 different builds 7 times each and
used "ministat" to compare the recovery times. It told me that:

* Master is around 11% faster than last week before commit c5315f4f
"Cache smgrnblocks() results in recovery."
* This patch gives a similar speedup, bringing the total to around 25%
faster than last week (the time is ~20% less, the WAL processing speed
is ~1.25x).

My test fit in RAM and was all cached. With the patch, the recovery
process used 100% of a single core the whole time and stayed on that
core and the variance is low, but in the other builds it hovered
around 90% and hopped around as it kept getting rescheduled and the
variance was higher.

Of course, SLRU fsyncs aren't the only I/O stalls in a real system;
among others, there are also reads from faulting in referenced pages
that don't have full page images in the WAL. I'm working on that
separately, but that's a tad more complicated than this stuff.

Added to commit fest.

=== ministat output showing recovery times in seconds ===

x patched.dat
+ master.dat
* lastweek.dat
+------------------------------------------------------------------------------+
| * |
| x + * |
|x x xx + + ++ + + * **** |
| |AM| |_____AM____| |_____A_M__||
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 7 38.655 39.406 39.218 39.134857 0.25188849
+ 7 42.128 45.068 43.958 43.815286 0.91387758
Difference at 95.0% confidence
4.68043 +/- 0.780722
11.9597% +/- 1.99495%
(Student's t, pooled s = 0.670306)
* 7 47.187 49.404 49.203 48.904286 0.76793483
Difference at 95.0% confidence
9.76943 +/- 0.665613
24.9635% +/- 1.70082%
(Student's t, pooled s = 0.571477)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Mariadasan (jomariad) 2020-08-05 06:08:15 Reg. Postgres 13
Previous Message Kyotaro Horiguchi 2020-08-05 05:46:23 Re: For standby pg_ctl doesn't wait for PM_STATUS_READY in presence of promote_trigger_file