Re: [PATCH] Performance Improvement For Copy From Binary Files

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Performance Improvement For Copy From Binary Files
Date: 2020-07-13 03:16:57
Message-ID: CALj2ACUefSOG=ffyhN4gNA3pCOhFwC8tpVFcBXY6utfvivYLMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> > This is due to the recent commit
> > cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb, in which we restricted the
> > raw_buf and line_buf allocations for binary files. Since we are using
> > raw_buf for this performance improvement feature, now, it's enough to
> > restrict only line_buf for binary files. I made the changes
> > accordingly in the v3 patch attached here.
> >
> > Regression tests(make check & make check-world) ran cleanly with the v3 patch.
>
> Thank you Bharath. I was a bit surprised that you had also submitted
> a patch to NOT allocate raw_buf for COPY FROM ... BINARY. :-)
>

Yes that was by me, before I started to work on this feature. I think
we can backpatch that change(assuming we don't backpatch this
feature), I will make the request accordingly.

Anyways, now we don't allow line_buf allocation for binary files,
which is also a good thing.

>
> > > > For the case where required nbytes may not fit into the buffer in
> > > > CopyReadFromRawBuf, I'm sure this can happen only for field data,
> > > > (field count , and field size are of fixed length and can fit in the
> > > > buffer), instead of reading them in parts of buff size into the buffer
> > > > (using CopyLoadRawBuf) and then DRAIN_COPY_RAW_BUF() to the
> > > > destination, I think we can detect this condition using requested
> > > > bytes and the buffer size and directly read from the file to the
> > > > destination buffer and then reload the raw_buffer for further
> > > > processing. I think this way, it will be good.
> > >
> > > Hmm, I'm afraid that this will make the code more complex for
> > > apparently small benefit. Is this really that much of a problem
> > > performance wise?
> > >
> >
> > Yes it makes CopyReadFromRawBuf(), code a bit complex from a
> > readability perspective. I'm convinced not to have the
> > abovementioned(by me) change, due to 3 reasons,1) the
> > readability/understandability 2) how many use cases can we have where
> > requested field size greater than RAW_BUF_SIZE(64KB)? I think very few
> > cases. I may be wrong here. 3) Performance wise it may not be much as
> > we do one extra memcpy only in situations where field sizes are
> > greater than 64KB(as we have already seen and observed by you as well
> > in one of the response [1]) that memcpy cost for this case may be
> > negligible.
>
> Actually, an extra memcpy is incurred on every call of
> CopyReadFromRawBuf(), but I haven't seen it to be very problematic.
>

Yes.

>
> CopyReadFromRawBuf as a name for the new function might be misleading
> as long as we are only using it for binary data. Maybe
> CopyReadBinaryData is more appropriate? See attached v4 with these
> and a few other cosmetic changes.
>

CopyReadBinaryData() looks meaningful. +1.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-07-13 04:00:22 Re: Creating a function for exposing memory usage of backend process
Previous Message Amit Langote 2020-07-13 02:32:30 Re: [PATCH] Performance Improvement For Copy From Binary Files