Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-06-23 02:37:48
Message-ID: CALDaNm3x-isUdoppJPAxjegLsFGKuTrs0pTSJai7K_TAPu-+nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Ashutosh For your review, my comments are inline.
On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Hi,
>
> I just got some time to review the first patch in the list i.e. 0001-Copy-code-readjustment-to-support-parallel-copy.patch. As the patch name suggests, it is just trying to reshuffle the existing code for COPY command here and there. There is no extra changes added in the patch as such, but still I do have some review comments, please have a look:
>
> 1) Can you please add some comments atop the new function PopulateAttributes() describing its functionality in detail. Further, this new function contains the code from BeginCopy() to set attribute level options used with COPY FROM such as FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL etc. in cstate and along with that it also copies the code from BeginCopy() to set other infos such as client encoding type, encoding conversion etc. Hence, I think it would be good to give it some better name, basically something that matches with what actually it is doing.
>

There is no new code added in this function, some part of code from
BeginCopy was made in to a new function as this part of code will also
be required for the parallel copy workers before the workers start the
actual copy operation. This code was made into a function to avoid
duplication. Changed the function name to PopulateGlobalsForCopyFrom &
added few comments.

> 2) Again, the name for the new function CheckCopyFromValidity() doesn't look good to me. From the function name it appears as if it does the sanity check of the entire COPY FROM command, but actually it is just doing the sanity check for the target relation specified with COPY FROM. So, probably something like CheckTargetRelValidity would look more sensible, I think? TBH, I am not good at naming the functions so you can always ignore my suggestions about function and variable names :)
>

Changed as suggested.
> 3) Any reason for not making CheckCopyFromValidity as a macro instead of a new function. It is just doing the sanity check for the target relation.
>

I felt there is reasonable number of lines in the function & it is not
in performance intensive path, so I preferred function over macro.
Your thoughts?

> 4) Earlier in CopyReadLine() function while trying to clear the EOL marker from cstate->line_buf.data (copied data), we were not checking if the line read by CopyReadLineText() function is a header line or not, but I can see that your patch checks that before clearing the EOL marker. Any reason for this extra check?
>

If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does
nothing for the header line, server basically calls CopyReadLine
again, it is a kind of small optimization. Anyway server is not going
to do anything with header line, I felt no need to clear EOL marker
for header lines.
/* on input just throw the header line away */
if (cstate->cur_lineno == 0 && cstate->header_line)
{
cstate->cur_lineno++;
if (CopyReadLine(cstate))
return false; /* done */
}

cstate->cur_lineno++;

/* Actually read the line into memory here */
done = CopyReadLine(cstate);
I think no need to make a fix for this. Your thoughts?

> 5) I noticed the below spurious line removal in the patch.
>
> @@ -3839,7 +3953,6 @@ static bool
> CopyReadLine(CopyState cstate)
> {
> bool result;
> -
>

Fixed.
I have attached the patch for the same with the fixes.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Copy-code-readjustment-to-support-parallel-copy.patch text/x-patch 16.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-06-23 02:48:26 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message David Rowley 2020-06-23 02:29:12 Re: Parallel Seq Scan vs kernel read ahead