Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-06-23 06:52:02
Message-ID: CALDaNm0H3N9gK7CMheoaXkO99g=uAPA93nSZXu0xDarPyPY6sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> I have attached the patch for the same with the fixes.

The patches were not applying on the head, attached the patches that can be
applied on head.
I have added a commitfest entry[1] for this feature.

[1] - https://commitfest.postgresql.org/28/2610/

On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:

> Thanks Ashutosh For your review, my comments are inline.
> On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
> wrote:
> >
> > Hi,
> >
> > I just got some time to review the first patch in the list i.e.
> 0001-Copy-code-readjustment-to-support-parallel-copy.patch. As the patch
> name suggests, it is just trying to reshuffle the existing code for COPY
> command here and there. There is no extra changes added in the patch as
> such, but still I do have some review comments, please have a look:
> >
> > 1) Can you please add some comments atop the new function
> PopulateAttributes() describing its functionality in detail. Further, this
> new function contains the code from BeginCopy() to set attribute level
> options used with COPY FROM such as FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL
> etc. in cstate and along with that it also copies the code from BeginCopy()
> to set other infos such as client encoding type, encoding conversion etc.
> Hence, I think it would be good to give it some better name, basically
> something that matches with what actually it is doing.
> >
>
> There is no new code added in this function, some part of code from
> BeginCopy was made in to a new function as this part of code will also
> be required for the parallel copy workers before the workers start the
> actual copy operation. This code was made into a function to avoid
> duplication. Changed the function name to PopulateGlobalsForCopyFrom &
> added few comments.
>
> > 2) Again, the name for the new function CheckCopyFromValidity() doesn't
> look good to me. From the function name it appears as if it does the sanity
> check of the entire COPY FROM command, but actually it is just doing the
> sanity check for the target relation specified with COPY FROM. So, probably
> something like CheckTargetRelValidity would look more sensible, I think?
> TBH, I am not good at naming the functions so you can always ignore my
> suggestions about function and variable names :)
> >
>
> Changed as suggested.
> > 3) Any reason for not making CheckCopyFromValidity as a macro instead of
> a new function. It is just doing the sanity check for the target relation.
> >
>
> I felt there is reasonable number of lines in the function & it is not
> in performance intensive path, so I preferred function over macro.
> Your thoughts?
>
> > 4) Earlier in CopyReadLine() function while trying to clear the EOL
> marker from cstate->line_buf.data (copied data), we were not checking if
> the line read by CopyReadLineText() function is a header line or not, but I
> can see that your patch checks that before clearing the EOL marker. Any
> reason for this extra check?
> >
>
> If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does
> nothing for the header line, server basically calls CopyReadLine
> again, it is a kind of small optimization. Anyway server is not going
> to do anything with header line, I felt no need to clear EOL marker
> for header lines.
> /* on input just throw the header line away */
> if (cstate->cur_lineno == 0 && cstate->header_line)
> {
> cstate->cur_lineno++;
> if (CopyReadLine(cstate))
> return false; /* done */
> }
>
> cstate->cur_lineno++;
>
> /* Actually read the line into memory here */
> done = CopyReadLine(cstate);
> I think no need to make a fix for this. Your thoughts?
>
> > 5) I noticed the below spurious line removal in the patch.
> >
> > @@ -3839,7 +3953,6 @@ static bool
> > CopyReadLine(CopyState cstate)
> > {
> > bool result;
> > -
> >
>
> Fixed.
> I have attached the patch for the same with the fixes.
> Thoughts?
>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>

Attachment Content-Type Size
0001-Copy-code-readjustment-to-support-parallel-copy.patch application/x-patch 16.8 KB
0004-Documentation-for-parallel-copy.patch application/x-patch 2.0 KB
0003-Allow-copy-from-command-to-process-data-from-file-ST.patch application/x-patch 40.4 KB
0002-Framework-for-leader-worker-in-parallel-copy.patch application/x-patch 33.1 KB
0005-Tests-for-parallel-copy.patch application/x-patch 20.3 KB
0006-Parallel-Copy-For-Binary-Format-Files.patch application/x-patch 25.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-06-23 07:07:04 Re: Threading in BGWorkers (!)
Previous Message Amit Kapila 2020-06-23 06:27:30 Re: min_safe_lsn column in pg_replication_slots view