Re: Improvements in Copy From

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: smithpb2250(at)gmail(dot)com
Cc: vignesh21(at)gmail(dot)com, dgrowleyml(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improvements in Copy From
Date: 2020-09-11 09:04:01
Message-ID: 20200911.180401.1250008268606505036.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 11 Sep 2020 18:44:13 +1000, Peter Smith <smithpb2250(at)gmail(dot)com> wrote in
> On Thu, Sep 10, 2020 at 9:21 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > Whether such a micro-optimisation is worth doing is another question.
> > Yes, what you suggested can also be done, but even I have the same
> > question as you. Because we will reduce just one function call, the
> > eof check is present immediately in the function, Should we include
> > this or not?
>
> I expect the difference from my suggestion is too small to be measured.
>
> Probably it is not worth changing the already complicated code unless
> those changes can achieve something observable.
>
> ~~
>
> FYI, I ran a few performance tests BEFORE/AFTER applying your patch.
>
> Perf results for \COPY 5GB CSV file to UNLOGGED table.
>
> perf -a –g <pid>
> psql -d test -c "\copy tbl from '/my/path/data_5GB.csv' with (format csv);”
> perf report –g
>
> BEFORE
> #1 CopyReadLineText = 12.70%, CopyLoadRawBuf = 0.81%
> #2 CopyReadLineText = 12.54%, CopyLoadRawBuf = 0.81%
> #3 CopyReadLineText = 12.52%, CopyLoadRawBuf = 0.81%
>
> AFTER
> #1 CopyReadLineText = 12.55%, CopyLoadRawBuf = 1.20%
> #2 CopyReadLineText = 12.15%, CopyLoadRawBuf = 1.10%
> #3 CopyReadLineText = 13.11%, CopyLoadRawBuf = 1.24%
> #4 CopyReadLineText = 12.86%, CopyLoadRawBuf = 1.18%
>
> I didn't quite know how to interpret those results. It was opposite
> what I expected. Perhaps the original excessive CopyLoadRawBuf calls
> were so brief they could often avoid being sampled? Anyway, I hope you
> have a better understanding of perf than I do and can explain it.
>
> I then repeated/times same tests but without perf
>
> BEFORE
> #1 4min.36s
> #2 4min.45s
> #3 4min.43s
> #4 4min.34s
>
> AFTER
> #1 4min.41s
> #2 4min.37s
> #3 4min.34s
>
> As you can see, unfortunately, the patch gave no observable benefit
> for my test case.

That observation agrees with my assumption.

At Fri, 11 Sep 2020 15:58:04 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
me> we should do that. On the contrary, if incoming data were
me> intermittently delayed for some reasons (heavy load of client or
me> in-between network), this patch would make things worse by waiting for
me> delayed bits before processing already received bits.

It seems that a slow network is enough to cause that behavior even
without any trouble,

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-09-11 09:24:00 RE: Transactions involving multiple postgres foreign servers, take 2
Previous Message Amit Kapila 2020-09-11 09:03:20 Re: Bug in logical decoding of in-progress transactions