Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-09-08 14:39:05
Message-ID: alpine.DEB.2.10.1509081531300.25033@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Amit,

> I have done some tests with both the patches(sort+flush) and below
> are results:

Thanks a lot for these runs on this great harware!

> Test - 1 (Data Fits in shared_buffers)

Rounded for easier comparison:

off off: 27480.4 ± 12791.1 [ 0, 16009, 32109, 37629, 51671] (2.8%)
off on : 27482.5 ± 12552.0 [ 0, 16587, 31226, 37516, 51297] (2.8%)

The two above case are pretty indistinguishable, sorting has no impact.
The 2.8% means more than 1 minute offline per hour (not necessarily a
whole minute, it may be distributed over the whole hour).

on off: 25214.8 ± 11059.7 [5268, 14188, 26472, 35626, 51479] (0.0%)
on on : 26819.6 ± 10589.7 [5192, 16825, 29430, 35708, 51475] (0.0%)

> For this test run, the best results are when both the sort and flush
> options are enabled, the value of lowest TPS is increased substantially
> without sacrificing much on average or median TPS values (though there
> is ~9% dip in median TPS value). When only sorting is enabled, there is
> neither significant gain nor any loss. When only flush is enabled,
> there is significant degradation in both average and median value of TPS
> ~8% and ~21% respectively.

I interpret the five numbers in bracket as an indicator of performance
stability: they should be equal for perfect stability. Once they show some
stability, the next point for me is to focus at the average performance. I
do not see a median decrease as a big issue if the average is reasonably

Thus I essentially note the -2.5% dip on average of on-on vs off-on. I
would say that it is probably significant, although it might be in the
error margin of the measure. Not sure whether the little stddev reduction
is really significant. Anyway the benefit is clear: 100% availability.

Flushing without sorting is a bad idea (tm), not a surprise.

> Test - 2 (Data doesn't fit in shared_buffers, but fits in RAM)

off off: 5050.1 ± 4884.5 [ 0, 98, 4699, 10126, 13631] ( 7.7%)
off on : 6194.2 ± 4913.5 [ 0, 98, 8982, 10558, 14035] (11.0%)
on off: 2771.3 ± 1861.0 [ 288, 2039, 2375, 2679, 12862] ( 0.0%)
on on : 6110.6 ± 1939.3 [1652, 5215, 5724, 6196, 13828] ( 0.0%)

I'm not sure that the off-on vs on-on -1.3% avg tps dip is significant,
but it may be. With both flushing and sorting pg becomes fully available,
and the standard deviation is devided by more than 2, so the benefit is

> For this test run, again the best results are when both the sort and flush
> options are enabled, the value of lowest TPS is increased substantially
> and the average and median value of TPS has also increased to
> ~21% and ~22% respectively. When only sorting is enabled, there is a
> significant gain in average and median TPS values, but then there is also
> an increase in number of times when TPS is below 10 which is bad.
> When only flush is enabled, there is significant degradation in both average
> and median value of TPS to ~82% and ~97% respectively, now I am not
> sure if such a big degradation could be expected for this case or it's just
> a problem in this run, I have not repeated this test.

Yes, I agree that it is strange that sorting without flushing on its own
both improves performance (+20% tps) but seems to degrade availability at
the same time. A rerun would have helped to check whether it is a fluke or
it is reproducible.

> Test - 3 (Data doesn't fit in shared_buffers, but fits in RAM)
> ----------------------------------------------------------------------------------------
> Same configuration and settings as above, but this time, I have enforced
> Flush to use posix_fadvise() rather than sync_file_range() (basically
> changed code to comment out sync_file_range() and enable posix_fadvise()).
> On using posix_fadvise(), the results for best case (both flush and sort as
> on) shows significant degradation in average and median TPS values
> by ~48% and ~43% which indicates that probably using posix_fadvise()
> with the current options might not be the best way to achieve Flush.

Yes, indeed.

The way posix_fadvise is implemented on Linux is between no effect and bad
effect (the buffer is erased). You hit the later quite strongly... As you
are doing a "not fit in shared buffer" test, it is essential that buffers
are kept in ram, but posix_fadvise on Linux just instructs to erase the
buffer from memory if it was already passed to the I/O subsystem, which
given the probably large I/O device cache on your host should be done
pretty quickly, so that later read must be fetch back from the device
(either cache or disk), which means a drop in performance.

Note that FreeBSD implementation seems more convincing, although less good
than Linux sync_file_range function. I've no idea about other systems.

> Overall, I think this patch (sort+flush) brings a lot of value on table
> in terms of stablizing the TPS during checkpoint, however some of the
> cases like use of posix_fadvise() and the case (all data fits in
> shared_buffers) where the value of median TPS is regressed could be
> investigated to see what can be done to improve them. I think more
> tests can be done to ensure the benefit or regression of this patch, but
> for now this is what best I can do.

Thanks a lot, again, for these tests!

I think that we may conclude, on these run:

(1) sorting seems not to harm performance, and may help a lot.

(2) Linux flushing with sync_file_range may degrade a little raw tps
average in some case, but definitely improves performance stability
(always 100% availability when on !).

(3) posix_fadvise on Linux is a bad idea... the good news is that it
is not needed there:-) How good or bad an idea it is on other system
is an open question...

These results are consistent with the current default values in the patch:
sorting is on by default, flushing is on with Linux and off otherwise

Also, as the effect on other systems is unclear, I think it is best to
keep both settings as GUCs for now.


In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2015-09-08 14:48:19 Re: jsonb_concat: make sure we always return a non-scalar value
Previous Message Alvaro Herrera 2015-09-08 14:13:15 Re: New functions