Re: Improvements in Copy From

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Surafel Temesgen <surafel3000(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improvements in Copy From
Date: 2020-09-10 10:16:58
Message-ID: CALDaNm0HioOL3f5+YjAywWWenDg=azTL8oC=7r56c7zCce12pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 7, 2020 at 1:19 PM Surafel Temesgen <surafel3000(at)gmail(dot)com> wrote:
>
>
> Hi Vignesh
>
> On Wed, Jul 1, 2020 at 3:46 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>>
>> Hi,
>>
>> While reviewing copy from I identified few improvements for copy from
>> that can be done :
>> a) copy from stdin copies lesser amount of data to buffer even though
>> space is available in buffer because minread was passed as 1 to
>> CopyGetData, Hence it only reads until the data read from libpq is
>> less than minread. This can be fixed by passing the actual space
>> available in buffer, this reduces the unnecessary frequent calls to
>> CopyGetData.
>
>
> why not applying the same optimization on file read ?

For file read this is already taken care as you can see from below code:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from COPY file: %m")));
if (bytesread == 0)
cstate->reached_eof = true;
break;

We do not have any condition to break unlike the case of stdin, we
read 1 * maxread size of data, So no need to do anything for it.

>
>>
>> c) Copy from reads header line and do nothing for the header line, we
>> need not clear EOL & need not convert to server encoding for the
>> header line.
>
>
> We have a patch for column matching feature [1] that may need a header line to be further processed. Even without that I think it is preferable to process the header line for nothing than adding those checks to the loop, performance-wise.

I had seen that patch, I feel that change to match the header if the
header is specified can be addressed in this patch if that patch gets
committed first or vice versa. We are doing a lot of processing for
the data which we need not do anything. Shouldn't this be skipped if
not required. Similar check is present in NextCopyFromRawFields also
to skip header.

Thoughts?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2020-09-10 10:21:09 Re: Yet another fast GiST build
Previous Message Kyotaro Horiguchi 2020-09-10 10:02:06 Re: SIGQUIT handling, redux