Re: Improvements in Copy From

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: surafel3000(at)gmail(dot)com
Cc: vignesh21(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improvements in Copy From
Date: 2020-09-11 06:58:04
Message-ID: 20200911.155804.359271394064499501.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 10 Sep 2020 21:55:27 +0300, Surafel Temesgen <surafel3000(at)gmail(dot)com> wrote in
> On Thu, Sep 10, 2020 at 1:17 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> >
> > >
> > > We have a patch for column matching feature [1] that may need a header
> > line to be further processed. Even without that I think it is preferable to
> > process the header line for nothing than adding those checks to the loop,
> > performance-wise.
> >
> > I had seen that patch, I feel that change to match the header if the
> > header is specified can be addressed in this patch if that patch gets
> > committed first or vice versa. We are doing a lot of processing for
> > the data which we need not do anything. Shouldn't this be skipped if
> > not required. Similar check is present in NextCopyFromRawFields also
> > to skip header.
> >
>
> The existing check is unavoidable but we can live better without the checks
> added by the patch. For very large files the loop may iterate millions of
> times if it is not in billion and I am sure doing the check that many times
> will incur noticeable performance degradation than further processing a
> single line.

FWIW, I thought the same thing seeing the additional if-conditions. It
gives more loss than gain.

For the first part, the patch reveals COPY_NEW_FE, which I don't think
to be a knowledge for the function, to CopyGetData. Considering that
that doesn't seem to offer noticeable performance gain, I don't think
we should do that. On the contrary, if incoming data were
intermittently delayed for some reasons (heavy load of client or
in-between network), this patch would make things worse by waiting for
delayed bits before processing already received bits.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-09-11 07:08:23 Re: Range checks of pg_test_fsync --secs-per-test and pg_test_timing --duration
Previous Message Ian Barwick 2020-09-11 06:42:34 Corner-case bug in pg_rewind