Re: [PATCH] Performance Improvement For Copy From Binary Files

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Performance Improvement For Copy From Binary Files
Date: 2020-07-13 02:32:30
Message-ID: CA+HiwqEuoAYrBGmDqFOnvOkpO2ZLBSE49VzoLKKdA+Bz-4vtcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 13, 2020 at 10:19 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > On Mon, Jul 13, 2020 at 4:06 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> >
> > This error showed up when cfbot tried it:
> >
> > COPY BINARY stud_emp FROM
> > '/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/stud_emp.data';
> > +ERROR: could not read from COPY file: Bad address
>
> This is due to the recent commit
> cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb, in which we restricted the
> raw_buf and line_buf allocations for binary files. Since we are using
> raw_buf for this performance improvement feature, now, it's enough to
> restrict only line_buf for binary files. I made the changes
> accordingly in the v3 patch attached here.
>
> Regression tests(make check & make check-world) ran cleanly with the v3 patch.

Thank you Bharath. I was a bit surprised that you had also submitted
a patch to NOT allocate raw_buf for COPY FROM ... BINARY. :-)

> Please also find my responses for:
>
> Vignesh's comment:
>
> > On Sun, Jul 12, 2020 at 6:43 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > I had reviewed the changes. I felt one minor change required:
> > + * CopyReadFromRawBuf
> > + * Reads 'nbytes' bytes from cstate->copy_file via
> > cstate->raw_buf and
> > + * writes then to 'saveTo'
> > + *
> > + * Useful when reading binary data from the file.
> > Should "writes then to 'saveTo'" be "writes them to 'dest'"?
> >
>
> Thanks Vignesh for reviewing the patch. Modified 'saveTo' to 'dest' in v3 patch.

My bad.

> Amit's comment:
>
> >
> > > For the case where required nbytes may not fit into the buffer in
> > > CopyReadFromRawBuf, I'm sure this can happen only for field data,
> > > (field count , and field size are of fixed length and can fit in the
> > > buffer), instead of reading them in parts of buff size into the buffer
> > > (using CopyLoadRawBuf) and then DRAIN_COPY_RAW_BUF() to the
> > > destination, I think we can detect this condition using requested
> > > bytes and the buffer size and directly read from the file to the
> > > destination buffer and then reload the raw_buffer for further
> > > processing. I think this way, it will be good.
> >
> > Hmm, I'm afraid that this will make the code more complex for
> > apparently small benefit. Is this really that much of a problem
> > performance wise?
> >
>
> Yes it makes CopyReadFromRawBuf(), code a bit complex from a
> readability perspective. I'm convinced not to have the
> abovementioned(by me) change, due to 3 reasons,1) the
> readability/understandability 2) how many use cases can we have where
> requested field size greater than RAW_BUF_SIZE(64KB)? I think very few
> cases. I may be wrong here. 3) Performance wise it may not be much as
> we do one extra memcpy only in situations where field sizes are
> greater than 64KB(as we have already seen and observed by you as well
> in one of the response [1]) that memcpy cost for this case may be
> negligible.

Actually, an extra memcpy is incurred on every call of
CopyReadFromRawBuf(), but I haven't seen it to be very problematic.

By the way, considering the rebase over cd22d3cdb9b, it seemed to me
that we needed to update the comments in CopyStateData struct
definition a bit more. While doing that, I realized
CopyReadFromRawBuf as a name for the new function might be misleading
as long as we are only using it for binary data. Maybe
CopyReadBinaryData is more appropriate? See attached v4 with these
and a few other cosmetic changes.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
CopyFrom-binary-buffering_v4.patch application/octet-stream 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-07-13 03:16:57 Re: [PATCH] Performance Improvement For Copy From Binary Files
Previous Message Kyotaro Horiguchi 2020-07-13 01:57:36 Re: archive status ".ready" files may be created too early