Re: WIP: WAL prefetch (another approach)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2021-04-09 03:37:04
Message-ID: 20210409033703.GP6592@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's some little language fixes.

BTW, before beginning "recovery", PG syncs all the data dirs.
This can be slow, and it seems like the slowness is frequently due to file
metadata. For example, that's an obvious consequence of an OS crash, after
which the page cache is empty. I've made a habit of running find /zfs -ls |wc
to pre-warm it, which can take a little bit, but then the recovery process
starts moments later. I don't have any timing measurements, but I expect that
starting to stat() all data files as soon as possible would be a win.

commit cc9707de333fe8242607cde9f777beadc68dbf04
Author: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
Date: Thu Apr 8 10:43:14 2021 -0500

WIP: doc review: Optionally prefetch referenced data in recovery.

1d257577e08d3e598011d6850fd1025858de8c8c

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bc4a8b2279..139dee7aa2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3621,7 +3621,7 @@ include_dir 'conf.d'
pool after that. However, on file systems with a block size larger
than
<productname>PostgreSQL</productname>'s, prefetching can avoid a
- costly read-before-write when a blocks are later written.
+ costly read-before-write when blocks are later written.
The default is off.
</para>
</listitem>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 24cf567ee2..36e00c92c2 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -816,9 +816,7 @@
prefetching mechanism is most likely to be effective on systems
with <varname>full_page_writes</varname> set to
<varname>off</varname> (where that is safe), and where the working
- set is larger than RAM. By default, prefetching in recovery is enabled
- on operating systems that have <function>posix_fadvise</function>
- support.
+ set is larger than RAM. By default, prefetching in recovery is disabled.
</para>
</sect1>

diff --git a/src/backend/access/transam/xlogprefetch.c b/src/backend/access/transam/xlogprefetch.c
index 28764326bc..363c079964 100644
--- a/src/backend/access/transam/xlogprefetch.c
+++ b/src/backend/access/transam/xlogprefetch.c
@@ -31,7 +31,7 @@
* stall; this is counted with "skip_fpw".
*
* The only way we currently have to know that an I/O initiated with
- * PrefetchSharedBuffer() has that recovery will eventually call ReadBuffer(),
+ * PrefetchSharedBuffer() has that recovery will eventually call ReadBuffer(), XXX: what ??
* and perform a synchronous read. Therefore, we track the number of
* potentially in-flight I/Os by using a circular buffer of LSNs. When it's
* full, we have to wait for recovery to replay records so that the queue
@@ -660,7 +660,7 @@ XLogPrefetcherScanBlocks(XLogPrefetcher *prefetcher)
/*
* I/O has possibly been initiated (though we don't know if it was
* already cached by the kernel, so we just have to assume that it
- * has due to lack of better information). Record this as an I/O
+ * was due to lack of better information). Record this as an I/O
* in progress until eventually we replay this LSN.
*/
XLogPrefetchIncrement(&SharedStats->prefetch);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 090abdad8b..8c72ba1f1a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2774,7 +2774,7 @@ static struct config_int ConfigureNamesInt[] =
{
{"wal_decode_buffer_size", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY,
gettext_noop("Maximum buffer size for reading ahead in the WAL during recovery."),
- gettext_noop("This controls the maximum distance we can read ahead n the WAL to prefetch referenced blocks."),
+ gettext_noop("This controls the maximum distance we can read ahead in the WAL to prefetch referenced blocks."),
GUC_UNIT_BYTE
},
&wal_decode_buffer_size,

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2021-04-09 03:50:28 Re: [POC] Fast COPY FROM command for the table with foreign partitions
Previous Message Kohei KaiGai 2021-04-09 03:33:07 Re: TRUNCATE on foreign table