Re: Getting rid of some more lseek() calls

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Getting rid of some more lseek() calls
Date: 2020-02-07 00:37:55
Message-ID: 20200207003755.cca2k7isvn6nxegr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-02-07 12:38:27 +1300, Thomas Munro wrote:
> Continuing the work done in commits 0dc8ead4 and c24dcd0c, here are a
> few more places where we could throw away some code by switching to
> pg_pread() and pg_pwrite().

Nice.

> From 5723976510f30385385628758d7118042c4e4bf6 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> Date: Fri, 7 Feb 2020 12:04:43 +1300
> Subject: [PATCH 1/3] Use pg_pread() and pg_pwrite() in slru.c.
>
> This avoids lseek() system calls at every SLRU I/O, as was
> done for relation files in commit c24dcd0c.
> ---
> src/backend/access/transam/slru.c | 25 ++++---------------------
> 1 file changed, 4 insertions(+), 21 deletions(-)
>
> diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
> index d5b7a08f73..f9efb22311 100644
> --- a/src/backend/access/transam/slru.c
> +++ b/src/backend/access/transam/slru.c
> @@ -646,7 +646,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
> SlruShared shared = ctl->shared;
> int segno = pageno / SLRU_PAGES_PER_SEGMENT;
> int rpageno = pageno % SLRU_PAGES_PER_SEGMENT;
> - int offset = rpageno * BLCKSZ;
> + off_t offset = rpageno * BLCKSZ;
> char path[MAXPGPATH];
> int fd;
>
> @@ -676,17 +676,9 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
> return true;
> }
>
> - if (lseek(fd, (off_t) offset, SEEK_SET) < 0)
> - {
> - slru_errcause = SLRU_SEEK_FAILED;
> - slru_errno = errno;
> - CloseTransientFile(fd);
> - return false;
> - }
> -
> errno = 0;
> pgstat_report_wait_start(WAIT_EVENT_SLRU_READ);
> - if (read(fd, shared->page_buffer[slotno], BLCKSZ) != BLCKSZ)
> + if (pg_pread(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ)
> {
> pgstat_report_wait_end();
> slru_errcause = SLRU_READ_FAILED;
> @@ -726,7 +718,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
> SlruShared shared = ctl->shared;
> int segno = pageno / SLRU_PAGES_PER_SEGMENT;
> int rpageno = pageno % SLRU_PAGES_PER_SEGMENT;
> - int offset = rpageno * BLCKSZ;
> + off_t offset = rpageno * BLCKSZ;
> char path[MAXPGPATH];
> int fd = -1;
>
> @@ -836,18 +828,9 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
> }
> }
>
> - if (lseek(fd, (off_t) offset, SEEK_SET) < 0)
> - {
> - slru_errcause = SLRU_SEEK_FAILED;
> - slru_errno = errno;
> - if (!fdata)
> - CloseTransientFile(fd);
> - return false;
> - }
> -
> errno = 0;
> pgstat_report_wait_start(WAIT_EVENT_SLRU_WRITE);
> - if (write(fd, shared->page_buffer[slotno], BLCKSZ) != BLCKSZ)
> + if (pg_pwrite(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ)
> {
> pgstat_report_wait_end();
> /* if write didn't set errno, assume problem is no disk space */
> --
> 2.23.0

Hm. This still leaves us with one source of SLRU_SEEK_FAILED. And that's
really just for getting the file size. Should we rename that?

Perhaps we should just replace lseek(SEEK_END) with fstat()? Or at least
one wrapper function for getting the size? It seems ugly to change fd
positions just for the purpose of getting file sizes (and also implies
more kernel level locking, I believe).

> From 95d7187172f2ac6c08dc92f1043e1662b0dab4ac Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> Date: Fri, 7 Feb 2020 12:04:57 +1300
> Subject: [PATCH 2/3] Use pg_pwrite() in rewriteheap.c.
>
> This removes an lseek() call.
> ---
> src/backend/access/heap/rewriteheap.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> index 5869922ff8..9c29bc0e0f 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -1156,13 +1156,6 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
> path, (uint32) xlrec->offset)));
> pgstat_report_wait_end();
>
> - /* now seek to the position we want to write our data to */
> - if (lseek(fd, xlrec->offset, SEEK_SET) != xlrec->offset)
> - ereport(ERROR,
> - (errcode_for_file_access(),
> - errmsg("could not seek to end of file \"%s\": %m",
> - path)));
> -
> data = XLogRecGetData(r) + sizeof(*xlrec);
>
> len = xlrec->num_mappings * sizeof(LogicalRewriteMappingData);
> @@ -1170,7 +1163,7 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
> /* write out tail end of mapping file (again) */
> errno = 0;
> pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE);
> - if (write(fd, data, len) != len)
> + if (pg_pwrite(fd, data, len, xlrec->offset) != len)
> {
> /* if write didn't set errno, assume problem is no disk space */
> if (errno == 0)

lgtm.

> From da6d712eeef2e3257d7fa672d95f2901bbe62887 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> Date: Fri, 7 Feb 2020 12:05:12 +1300
> Subject: [PATCH 3/3] Use pg_pwrite() in walreceiver.c.
>
> This gets rid of an lseek() call. While there was code to avoid
> it in most cases, it's better to lose the call AND the global state
> and code required to avoid it.
> ---
> src/backend/replication/walreceiver.c | 28 +++------------------------
> 1 file changed, 3 insertions(+), 25 deletions(-)
>
> diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
> index a5e85d32f3..2ab15c3cbb 100644
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -85,14 +85,13 @@ WalReceiverFunctionsType *WalReceiverFunctions = NULL;
> #define NAPTIME_PER_CYCLE 100 /* max sleep time between cycles (100ms) */
>
> /*
> - * These variables are used similarly to openLogFile/SegNo/Off,
> + * These variables are used similarly to openLogFile/SegNo,
> * but for walreceiver to write the XLOG. recvFileTLI is the TimeLineID
> * corresponding the filename of recvFile.
> */
> static int recvFile = -1;
> static TimeLineID recvFileTLI = 0;
> static XLogSegNo recvSegNo = 0;
> -static uint32 recvOff = 0;
>
> /*
> * Flags set by interrupt handlers of walreceiver for later service in the
> @@ -945,7 +944,6 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
> use_existent = true;
> recvFile = XLogFileInit(recvSegNo, &use_existent, true);
> recvFileTLI = ThisTimeLineID;
> - recvOff = 0;
> }
>
> /* Calculate the start offset of the received logs */
> @@ -956,29 +954,10 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
> else
> segbytes = nbytes;
>
> - /* Need to seek in the file? */
> - if (recvOff != startoff)
> - {
> - if (lseek(recvFile, (off_t) startoff, SEEK_SET) < 0)
> - {
> - char xlogfname[MAXFNAMELEN];
> - int save_errno = errno;
> -
> - XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
> - errno = save_errno;
> - ereport(PANIC,
> - (errcode_for_file_access(),
> - errmsg("could not seek in log segment %s to offset %u: %m",
> - xlogfname, startoff)));
> - }
> -
> - recvOff = startoff;
> - }
> -
> /* OK to write the logs */
> errno = 0;
>
> - byteswritten = write(recvFile, buf, segbytes);
> + byteswritten = pg_pwrite(recvFile, buf, segbytes, (off_t) startoff);
> if (byteswritten <= 0)
> {
> char xlogfname[MAXFNAMELEN];
> @@ -995,13 +974,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
> (errcode_for_file_access(),
> errmsg("could not write to log segment %s "
> "at offset %u, length %lu: %m",
> - xlogfname, recvOff, (unsigned long) segbytes)));
> + xlogfname, startoff, (unsigned long) segbytes)));
> }
>
> /* Update state for write */
> recptr += byteswritten;
>
> - recvOff += byteswritten;
> nbytes -= byteswritten;
> buf += byteswritten;

lgtm.

There's still a few more lseek(SEEK_SET) calls in the backend after this
(walsender, miscinit, pg_stat_statements). It'd imo make sense to just
try to get rid of all of them in one series this time round?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-02-07 01:39:52 Re: table partitioning and access privileges
Previous Message Michael Paquier 2020-02-07 00:33:35 Re: Improve errors when setting incorrect bounds for SSL protocols