Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c)

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c)
Date: 2020-09-15 17:43:56
Message-ID: CAEudQArpwWoA1qzpuC3MX_-0HdQi=qsgOyeU+oh3v8HRfEOULA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em sex., 11 de set. de 2020 às 18:43, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
escreveu:

> Em sex., 11 de set. de 2020 às 17:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> escreveu:
>
>> Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> writes:
>> > New version, with support to read Virtual File (pipe, FIFO and socket).
>> > With assert, in case, erroneous, of trying to read a pipe, with offset.
>>
>> Really, could you do a little more thinking and testing of your own,
>> rather than expecting the rest of us to point out holes in your thinking?
>>
> Yes, I can.
>
>
>> * bytes_to_read == 0 is a legal case.
>>
> Ok. Strange call to read a file with zero bytes.
>
>
>> * "Assert(seek_offset != 0)" is an equally awful idea.
>>
> I was trying to predict the case of reading a Virtual File, with
> bytes_to_read == -1 and seek_offset == 0,
> which is bound to fail in fseeko.
> In any case, 96d1f423f95d it will fail with any Virtual File.
>
>
>>
>> * Removing the seek code from the negative-bytes_to_read path
>> is just broken.
>>
> Ok.
>
>
>> * The only reason this code is shorter than the previous version is
>> you took out all the comments explaining what it was doing, and
>> failed to replace them. This is just as subtle as before, so I
>> don't find that acceptable.
>>
> It was not my intention.
>
>
>>
>> I do agree that it might be worth skipping the fseeko call in the
>> probably-common case where seek_offset is zero. Otherwise I don't
>> see much reason to change what's there.
>>
> Well, I think that v3 is a little better, but it’s just my opinion.
> v3 try to copy directly in the final buffer, which will certainly be a
> little faster.
>
v4 patch, simple benchmark, read binary file with size > 6GB.

PostgreSQL main:
postgres=# \timing on
Timing is on.
postgres=# select
pg_read_file('c:\users\ranier\downloads\macOS_Catalina.7z');
ERROR: file length too large
Time: 3068,459 ms (00:03,068)
postgres=#

PostgreSQL patched (v4 read_binary_file):
postgres=# \timing on
Timing is on.
postgres=# select
pg_read_file('c:\users\ranier\downloads\macOS_Catalina.7z');
ERROR: file length too large
Time: 701,035 ms
postgres=#

4.37x faster, very good.

v4 handles pipes very well.
Tested with
https://docs.microsoft.com/en-us/windows/win32/ipc/multithreaded-pipe-server

Terminal one:
C:\usr\src\tests\pipes>pipe_server

Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe

Terminal two:
postgres=# \timing on
Timing is on.
postgres=# select pg_read_file('\\.\pipe\mynamedpipe');

Terminal one:
Client connected, creating a processing thread.

Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe
InstanceThread created, receiving and processing messages.

Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe
InstanceThread created, receiving and processing messages.
^C
C:\usr\src\tests\pipes>

Terminal two:
postgres=# select pg_read_file('\\.\pipe\mynamedpipe');
pg_read_file
--------------

(1 row)

Time: 77267,481 ms (01:17,267)
postgres=#

regards,
Ranier Vilela

Attachment Content-Type Size
v4-0001-simplified_read_binary_file.patch application/octet-stream 4.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-09-15 17:54:18 Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c)
Previous Message Tom Lane 2020-09-15 17:41:02 Re: pg_restore causing deadlocks on partitioned tables