pread() and pwrite()

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Oskari Saarenmaa <os(at)ohmu(dot)fi>, Tobias Oberstein <tobias(dot)oberstein(at)gmail(dot)com>
Subject: pread() and pwrite()
Date: 2018-07-12 01:55:31
Message-ID: CAEepm=02rapCpPR3ZGF2vW=SBHSdFYO_bz_f-wwWJonmA3APgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

A couple of years ago, Oskari Saarenmaa proposed a patch[1] to adopt
$SUBJECT. Last year, Tobias Oberstein argued again that we should do
that[2]. At the end of that thread there was a +1 from multiple
committers in support of getting it done for PostgreSQL 12. Since
Oskari hasn't returned, I decided to dust off his patch and start a
new thread.

Here is a rebase and an initial review. I plan to address the
problems myself, unless Oskari wants to do that in which case I'll
happily review and hopefully eventually commit it. It's possible I
introduced new bugs while rebasing since basically everything moved
around, but it passes check-world here with and without HAVE_PREAD and
HAVE_PWRITE defined.

Let me summarise what's going on in the patch:

1. FileRead() and FileWrite() are replaced with FileReadAt() and
FileWriteAt(), taking a new offset argument. Now we can do one
syscall instead of two for random reads and writes.
2. fd.c no longer tracks seek position, so we lose a bunch of cruft
from the vfd machinery. FileSeek() now only supports SEEK_SET and
SEEK_END.

This is taking the position that we no longer want to be able to do
file IO with implicit positioning at all. I think that seems
reasonable: it's nice to drop all that position tracking and caching
code, and the seek-based approach would be incompatible with any
multithreading plans. I'm not even sure I'd bother adding "At" to the
function names. If there are any extensions that want the old
interface they will fail to compile either way. Note that the BufFile
interface remains the same, so hopefully that covers many use cases.

I guess the only remaining reason to use FileSeek() is to get the file
size? So I wonder why SEEK_SET remains valid in the patch... if my
suspicion is correct that only SEEK_END still has a reason to exist,
perhaps we should just kill FileSeek() and add FileSize() or something
instead?

pgstat_report_wait_start(wait_event_info);
+#ifdef HAVE_PREAD
+ returnCode = pread(vfdP->fd, buffer, amount, offset);
+#else
+ lseek(VfdCache[file].fd, offset, SEEK_SET);
returnCode = read(vfdP->fd, buffer, amount);
+#endif
pgstat_report_wait_end();

This obviously lacks error handling for lseek().

I wonder if anyone would want separate wait_event tracking for the
lseek() (though this codepath would be used by almost nobody,
especially if someone adds Windows support, so it's probably not worth
bothering with).

I suppose we could assume that if you have pread() you must also have
pwrite() and save on ./configure cycles.

I will add this to the next Festival of Commits, though clearly it
needs a bit more work before the festivities begin.

[1] https://www.postgresql.org/message-id/flat/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi
[2] https://www.postgresql.org/message-id/flat/b8748d39-0b19-0514-a1b9-4e5a28e6a208%40gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Use-pread-pwrite-instead-of-lseek-read-write-v1.patch application/octet-stream 23.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-07-12 01:58:20 Re: [HACKERS] Replication status in logical replication
Previous Message Yugo Nagata 2018-07-12 01:44:36 Re: Preferring index-only-scan when the cost is equal