Re: WAL prefetch

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-15 15:15:11
Message-ID: ef234489-1875-cde1-1ff1-0a58de95fb9b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.06.2018 18:03, Amit Kapila wrote:
> On Fri, Jun 15, 2018 at 1:08 PM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>
>> On 15.06.2018 07:36, Amit Kapila wrote:
>>> On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost <sfrost(at)snowman(dot)net>
>>> wrote:
>>>>> I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb
>>>>> NVME
>>>>> RAID 10 storage device and 256Gb of RAM connected using InfiniBand.
>>>>> The speed of synchronous replication between two nodes is increased from
>>>>> 56k
>>>>> TPS to 60k TPS (on pgbench with scale 1000).
>>>> I'm also surprised that it wasn't a larger improvement.
>>>>
>>>> Seems like it would make sense to implement in core using
>>>> posix_fadvise(), perhaps in the wal receiver and in RestoreArchivedFile
>>>> or nearby.. At least, that's the thinking I had when I was chatting w/
>>>> Sean.
>>>>
>>> Doing in-core certainly has some advantage such as it can easily reuse
>>> the existing xlog code rather trying to make a copy as is currently
>>> done in the patch, but I think it also depends on whether this is
>>> really a win in a number of common cases or is it just a win in some
>>> limited cases.
>>>
>> I am completely agree. It was my mail concern: on which use cases this
>> prefetch will be efficient.
>> If "full_page_writes" is on (and it is safe and default value), then first
>> update of a page since last checkpoint will be written in WAL as full page
>> and applying it will not require reading any data from disk.
>>
> What exactly you mean by above? AFAIU, it needs to read WAL to apply
> full page image. See below code:
>
> XLogReadBufferForRedoExtended()
> {
> ..
> /* If it has a full-page image and it should be restored, do it. */
> if (XLogRecBlockImageApply(record, block_id))
> {
> Assert(XLogRecHasBlockImage(record, block_id));
> *buf = XLogReadBufferExtended(rnode, forknum, blkno,
> get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK);
> page = BufferGetPage(*buf);
> if (!RestoreBlockImage(record, block_id, page))
> ..
> }
>
>

Sorry, for my confusing statement.
Definitely we need to read page from WAL.
I mean that in case of "full page write" we do not need to read updated
page from the database.
It can be just overwritten.

pg_prefaulter and my wal_prefetch are not prefetching WAL pages themselves.
There is no sense to do it, because them are just written by
wal_receiver and so should be present in file system cache.
wal_prefetch is prefetching blocks referenced by WAL records. But in
case of "full page writes" such prefetch is not needed and even is harmful.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arseny Sher 2018-06-15 15:27:56 Re: Possible bug in logical replication.
Previous Message Nathan Bossart 2018-06-15 15:08:27 Re: Make description of heap records more talkative for flags