Quick Links

Re: should crash recovery ignore checkpoint_flush_after ?

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Subject:	Re: should crash recovery ignore checkpoint_flush_after ?
Date:	2020-01-19 16:13:57
Message-ID:	20200119161357.GR26045@telsasoft.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Jan 18, 2020 at 03:32:02PM -0800, Andres Freund wrote:
> On 2020-01-19 09:52:21 +1300, Thomas Munro wrote:
> > On Sun, Jan 19, 2020 at 3:08 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > Does sync_file_range() even do anything for non-mmap'd files on ZFS?
>
> Good point. Next time it might be worthwhile to use strace -T to see
> whether the sync_file_range calls actually take meaningful time.

> Yea, it requires the pages to be in the pagecache to do anything:

> if (!mapping_cap_writeback_dirty(mapping) ||
> !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
> return 0;

That logic is actually brand new (Sep 23, 2019, linux 5.4)
https://github.com/torvalds/linux/commit/c3aab9a0bd91b696a852169479b7db1ece6cbf8c#diff-fd2d793b8b4760b4887c8c7bbb3451d7

Running a manual CHECKPOINT, I saw stuff like:

sync_file_range(0x15f, 0x1442c000, 0x2000, 0x2) = 0 <2.953956>
sync_file_range(0x15f, 0x14430000, 0x4000, 0x2) = 0 <0.006395>
sync_file_range(0x15f, 0x14436000, 0x4000, 0x2) = 0 <0.003859>
sync_file_range(0x15f, 0x1443e000, 0x2000, 0x2) = 0 <0.027975>
sync_file_range(0x15f, 0x14442000, 0x2000, 0x2) = 0 <0.000048>

And actually, that server had been running its DB instance on a centos6 VM
(kernel-2.6.32-754.23.1.el6.x86_64), shared with the appserver, to mitigate
another issue last year. I moved the DB back to its own centos7 VM
(kernel-3.10.0-862.14.4.el7.x86_64), and I cannot see that anymore.
It seems if there's any issue (with postgres or otherwise), it's vastly
mitigated or much harder to hit under modern kernels.

I also found these:
https://github.com/torvalds/linux/commit/23d0127096cb91cb6d354bdc71bd88a7bae3a1d5 (master v5.5-rc6...v4.4-rc1)
https://github.com/torvalds/linux/commit/ee53a891f47444c53318b98dac947ede963db400 (master v5.5-rc6...v2.6.29-rc1)

The 2nd commit is maybe the cause of the issue.

The first commit is supposedly too new to explain the difference between the
two kernels, but I'm guessing redhat maybe backpatched it into the 3.10 kernel.

Thanks,
Justin

In response to

Re: should crash recovery ignore checkpoint_flush_after ? at 2020-01-18 23:32:02 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	曾文旌 (义从)	2020-01-19 17:04:38	Re: [Proposal] Global temporary tables
Previous Message	Tomas Vondra	2020-01-19 14:37:07	SLRU statistics