Re: [PATCH] Performance Improvement For Copy From Binary Files

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Performance Improvement For Copy From Binary Files
Date: 2020-07-27 02:29:54
Message-ID: CA+HiwqHz=xE2HHYARyem7SDnnNVOV9_wRD5SfEgbSk60Ad7C1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 26, 2020 at 6:06 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> > [ v7-0001-Improve-performance-of-binary-COPY-FROM-with-buff.patch ]
>
> Pushed with cosmetic changes.

Thanks for that.

> I'd always supposed that stdio does enough internal buffering that short
> fread()s shouldn't be much worse than memcpy(). But I reproduced your
> result of ~30% speedup for data with a lot of narrow text columns, using
> RHEL 8.2. Thinking this an indictment of glibc, I also tried it on
> current macOS, and saw an even bigger speedup, approaching 50%. So
> there's definitely something to this. I wonder if we ought to check
> other I/O-constrained users of fread and fwrite, like pg_dump/pg_restore.

Ah, maybe a good idea to check that.

> A point that I did not see addressed in the thread is whether this
> has any negative impact on the copy-from-frontend code path, where
> there's no fread() to avoid; short reads from CopyGetData() are
> already going to be satisfied by memcpy'ing from the fe_msgbuf.
> However, a quick check suggests that this patch is still a small
> win for that case too --- apparently the control overhead in
> CopyGetData() is not negligible.

Indeed.

> So the patch seems fine functionally, but there were some cosmetic
> things I didn't like:
>
> * Removing CopyGetInt32 and CopyGetInt16 seemed like a pretty bad
> idea, because it made the callers much uglier and more error-prone.
> This is a particularly bad example:
>
> /* Header extension length */
> - if (!CopyGetInt32(cstate, &tmp) ||
> - tmp < 0)
> + if (CopyReadBinaryData(cstate, (char *) &tmp, sizeof(tmp)) !=
> + sizeof(tmp) || (tmp = (int32) pg_ntoh32(tmp)) < 0)
>
> Putting side-effects into late stages of an if-condition is just
> awful coding practice. They're easy for a reader to miss and they
> are magnets for bugs, because of the possibility that control doesn't
> reach that part of the condition.
>
> You can get the exact same speedup without any of those disadvantages
> by marking these two functions "inline", so that's what I did.
>
> * I dropped the DRAIN_COPY_RAW_BUF macro too, as in my estimation it was
> a net negative for readability. With only two use-cases, having it made
> the code longer not shorter; I was also pretty unconvinced about the
> wisdom of having some of the loop's control logic inside the macro and
> some outside.
>
> * BTW, the macro definitions weren't particularly per project style
> anyway. We generally put at least one space before line-ending
> backslashes. I don't think pgindent will fix this for you; IME
> it doesn't touch macro definitions at all.
>
> * Did some more work on the comments.

Thanks for these changes.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-07-27 02:45:47 Re: Fast DSM segments
Previous Message Michael Paquier 2020-07-27 01:31:28 Re: handle a ECPG_bytea typo