Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Sait Talha Nisanci <Sait(dot)Nisanci(at)microsoft(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)
Date: 2020-08-27 18:26:42
Message-ID: 20200827182642.GO29590@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Sait Talha Nisanci (Sait(dot)Nisanci(at)microsoft(dot)com) wrote:
> OS version is Ubuntu 18.04.5 LTS.
> Filesystem is ext4 and block size is 4KB.

[...]

* Sait Talha Nisanci (Sait(dot)Nisanci(at)microsoft(dot)com) wrote:
> I have run some benchmarks for this patch. Overall it seems that there is a good improvement with the patch on recovery times:
>
> The VMs I used have 32GB RAM, pgbench is initialized with a scale factor 3000(so it doesn’t fit to memory, ~45GB).
>
> In order to avoid checkpoints during benchmark, max_wal_size(200GB) and checkpoint_timeout(200 mins) are set to a high value.
>
> The run is cancelled when there is a reasonable amount of WAL ( > 25GB). The recovery times are measured from the REDO logs.
>
> I have tried combination of SSD, HDD, full_page_writes = on/off and max_io_concurrency = 10/50, the recovery times are as follows (in seconds):
>
> No prefetch | Default prefetch values | Default + max_io_concurrency = 50
> SSD, full_page_writes = on 852 301 197
> SSD, full_page_writes = off 1642 1359 1391
> HDD, full_page_writes = on 6027 6345 6390
> HDD, full_page_writes = off 738 275 192
>
> Default prefetch values:
> - Max_recovery_prefetch_distance = 256KB
> - Max_io_concurrency = 10
>
> It probably makes sense to compare each row separately as the size of WAL can be different.

Is WAL FPW compression enabled..? I'm trying to figure out how, given
what's been shared here, that replaying 25GB of WAL is being helped out
by 2.5x thanks to prefetch in the SSD case. That prefetch is hurting in
the HDD case entirely makes sense to me- we're spending time reading
pages from the HDD, which is entirely pointless work given that we're
just going to write over those pages entirely with FPWs.

Further, if there's 32GB of RAM, and WAL compression isn't enabled and
the WAL is only 25GB, then it's very likely that every page touched by
the WAL ends up in memory (shared buffers or fs cache), and with FPWs we
shouldn't ever need to actually read from the storage to get those
pages, right? So how is prefetch helping so much..?

I'm not sure that the 'full_page_writes = off' tests are very
interesting in this case, since you're going to get torn pages and
therefore corruption and hopefully no one is running with that
configuration with this OS/filesystem.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-08-27 18:40:28 Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)
Previous Message Sait Talha Nisanci 2020-08-27 17:36:01 RE: [EXTERNAL] Re: WIP: WAL prefetch (another approach)