Re: pread() and pwrite()

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Oskari Saarenmaa <os(at)ohmu(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tobias Oberstein <tobias(dot)oberstein(at)gmail(dot)com>
Subject: Re: pread() and pwrite()
Date: 2018-09-19 01:48:06
Message-ID: CAEepm=1DZSzdBRDbXFXSiu_Hiv08f22kFmgCi7ZBRERF_=ZCPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 7, 2018 at 2:17 AM Jesper Pedersen
<jesper(dot)pedersen(at)redhat(dot)com> wrote:
> > This needs a rebase again.

Done.

> Would it be of benefit to update these call sites
>
> * slru.c
> - SlruPhysicalReadPage
> - SlruPhysicalWritePage
> * xlogutils.c
> - XLogRead
> * pg_receivewal.c
> - FindStreamingStart
> * rewriteheap.c
> - heap_xlog_logical_rewrite
> * walreceiver.c
> - XLogWalRcvWrite

It certainly wouldn't hurt... but more pressing to get this committed
would be Windows support IMHO. I think the thing to do is to open
files with the FILE_FLAG_OVERLAPPED flag, and then use ReadFile() and
WriteFile() with an LPOVERLAPPED struct that holds an offset, but I'm
not sure if I can write that myself. I tried doing some semi-serious
Windows development for the fsyncgate patch using only AppVeyor CI a
couple of weeks ago and it was like visiting the dentist.

On Thu, Sep 6, 2018 at 6:42 AM Jesper Pedersen
<jesper(dot)pedersen(at)redhat(dot)com> wrote:
> Once resolved the patch passes make check-world, and a strace analysis
> shows the associated read()/write() have been turned into
> pread64()/pwrite64(). All lseek()'s are SEEK_END's.

Yeah :-) Just for fun, here is the truss output for a single pgbench
transaction here:

recvfrom(9,"B\0\0\0\^R\0P0_4\0\0\0\0\0\0\^A"...,8192,0,NULL,0x0) = 41 (0x29)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\nBEG"...,27,0,NULL,0) = 27 (0x1b)
recvfrom(9,"B\0\0\0&\0P0_5\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 61 (0x3d)
pread(22,"\0\0\0\0\^P\M^C\M-(at)P\0\0\0\0\M-X"...,8192,0x960a000) = 8192 (0x2000)
pread(20,"\0\0\0\0\M^X\^D\M^Iq\0\0\0\0\^T"...,8192,0x380fe000) = 8192 (0x2000)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e)
recvfrom(9,"B\0\0\0\^]\0P0_6\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 52 (0x34)
sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,75,0,NULL,0) = 75 (0x4b)
recvfrom(9,"B\0\0\0"\0P0_7\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 57 (0x39)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e)
recvfrom(9,"B\0\0\0!\0P0_8\0\0\0\0\^B\0\0\0"...,8192,0,NULL,0x0) = 56 (0x38)
lseek(29,0x0,SEEK_END) = 8192 (0x2000)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\rUPD"...,30,0,NULL,0) = 30 (0x1e)
recvfrom(9,"B\0\0\0003\0P0_9\0\0\0\0\^D\0\0"...,8192,0,NULL,0x0) = 74 (0x4a)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\^OIN"...,32,0,NULL,0) = 32 (0x20)
recvfrom(9,"B\0\0\0\^S\0P0_10\0\0\0\0\0\0\^A"...,8192,0,NULL,0x0) = 42 (0x2a)
pwrite(33,"\M^X\M-P\^E\0\^A\0\0\0\0\M-`\M^A"...,16384,0x81e000) = 16384 (0x4000)
fdatasync(0x21) = 0 (0x0)
sendto(9,"2\0\0\0\^Dn\0\0\0\^DC\0\0\0\vCOM"...,28,0,NULL,0) = 28 (0x1c)

There is only one lseek() left. I actually have another patch that
gets rid of even that (by caching sizes in SMgrRelation using a shared
invalidation counter which I'm not yet sure about). Then pgbench's
7-round-trip transaction makes only the strictly necessary 18 syscalls
(every one an explainable network message, disk page or sync).
Unpatched master has 5 extra lseek()s.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Use-pread-pwrite-instead-of-lseek-read-write-v4.patch application/octet-stream 25.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-09-19 01:58:36 Re: Usage of epoch in txid_current
Previous Message Julian Paul 2018-09-19 01:24:29 Re: Code of Conduct