From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c) |
Date: | 2020-09-15 17:43:56 |
Message-ID: | CAEudQArpwWoA1qzpuC3MX_-0HdQi=qsgOyeU+oh3v8HRfEOULA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Em sex., 11 de set. de 2020 às 18:43, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
escreveu:
> Em sex., 11 de set. de 2020 às 17:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> escreveu:
>
>> Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> writes:
>> > New version, with support to read Virtual File (pipe, FIFO and socket).
>> > With assert, in case, erroneous, of trying to read a pipe, with offset.
>>
>> Really, could you do a little more thinking and testing of your own,
>> rather than expecting the rest of us to point out holes in your thinking?
>>
> Yes, I can.
>
>
>> * bytes_to_read == 0 is a legal case.
>>
> Ok. Strange call to read a file with zero bytes.
>
>
>> * "Assert(seek_offset != 0)" is an equally awful idea.
>>
> I was trying to predict the case of reading a Virtual File, with
> bytes_to_read == -1 and seek_offset == 0,
> which is bound to fail in fseeko.
> In any case, 96d1f423f95d it will fail with any Virtual File.
>
>
>>
>> * Removing the seek code from the negative-bytes_to_read path
>> is just broken.
>>
> Ok.
>
>
>> * The only reason this code is shorter than the previous version is
>> you took out all the comments explaining what it was doing, and
>> failed to replace them. This is just as subtle as before, so I
>> don't find that acceptable.
>>
> It was not my intention.
>
>
>>
>> I do agree that it might be worth skipping the fseeko call in the
>> probably-common case where seek_offset is zero. Otherwise I don't
>> see much reason to change what's there.
>>
> Well, I think that v3 is a little better, but it’s just my opinion.
> v3 try to copy directly in the final buffer, which will certainly be a
> little faster.
>
v4 patch, simple benchmark, read binary file with size > 6GB.
PostgreSQL main:
postgres=# \timing on
Timing is on.
postgres=# select
pg_read_file('c:\users\ranier\downloads\macOS_Catalina.7z');
ERROR: file length too large
Time: 3068,459 ms (00:03,068)
postgres=#
PostgreSQL patched (v4 read_binary_file):
postgres=# \timing on
Timing is on.
postgres=# select
pg_read_file('c:\users\ranier\downloads\macOS_Catalina.7z');
ERROR: file length too large
Time: 701,035 ms
postgres=#
4.37x faster, very good.
v4 handles pipes very well.
Tested with
https://docs.microsoft.com/en-us/windows/win32/ipc/multithreaded-pipe-server
Terminal one:
C:\usr\src\tests\pipes>pipe_server
Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe
Terminal two:
postgres=# \timing on
Timing is on.
postgres=# select pg_read_file('\\.\pipe\mynamedpipe');
Terminal one:
Client connected, creating a processing thread.
Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe
InstanceThread created, receiving and processing messages.
Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe
InstanceThread created, receiving and processing messages.
^C
C:\usr\src\tests\pipes>
Terminal two:
postgres=# select pg_read_file('\\.\pipe\mynamedpipe');
pg_read_file
--------------
(1 row)
Time: 77267,481 ms (01:17,267)
postgres=#
regards,
Ranier Vilela
Attachment | Content-Type | Size |
---|---|---|
v4-0001-simplified_read_binary_file.patch | application/octet-stream | 4.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2020-09-15 17:54:18 | Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c) |
Previous Message | Tom Lane | 2020-09-15 17:41:02 | Re: pg_restore causing deadlocks on partitioned tables |