Re: WAL prefetch

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-15 15:03:01
Message-ID: CAA4eK1Jvo7zM4zjhkVQ1Uweg7L4iuxXrqZOtFrMC-0JDrA5ETA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 15, 2018 at 1:08 PM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
>
> On 15.06.2018 07:36, Amit Kapila wrote:
>>
>> On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost <sfrost(at)snowman(dot)net>
>> wrote:
>>>>
>>>> I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb
>>>> NVME
>>>> RAID 10 storage device and 256Gb of RAM connected using InfiniBand.
>>>> The speed of synchronous replication between two nodes is increased from
>>>> 56k
>>>> TPS to 60k TPS (on pgbench with scale 1000).
>>>
>>> I'm also surprised that it wasn't a larger improvement.
>>>
>>> Seems like it would make sense to implement in core using
>>> posix_fadvise(), perhaps in the wal receiver and in RestoreArchivedFile
>>> or nearby.. At least, that's the thinking I had when I was chatting w/
>>> Sean.
>>>
>> Doing in-core certainly has some advantage such as it can easily reuse
>> the existing xlog code rather trying to make a copy as is currently
>> done in the patch, but I think it also depends on whether this is
>> really a win in a number of common cases or is it just a win in some
>> limited cases.
>>
> I am completely agree. It was my mail concern: on which use cases this
> prefetch will be efficient.
> If "full_page_writes" is on (and it is safe and default value), then first
> update of a page since last checkpoint will be written in WAL as full page
> and applying it will not require reading any data from disk.
>

What exactly you mean by above? AFAIU, it needs to read WAL to apply
full page image. See below code:

XLogReadBufferForRedoExtended()
{
..
/* If it has a full-page image and it should be restored, do it. */
if (XLogRecBlockImageApply(record, block_id))
{
Assert(XLogRecHasBlockImage(record, block_id));
*buf = XLogReadBufferExtended(rnode, forknum, blkno,
get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK);
page = BufferGetPage(*buf);
if (!RestoreBlockImage(record, block_id, page))
..
}

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2018-06-15 15:08:27 Re: Make description of heap records more talkative for flags
Previous Message Charles Cui 2018-06-15 14:58:32 Re: [GSoC] current working status