Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-07-22 14:26:25
Message-ID: CALDaNm1qZ-yDmFWjJ5OXX0LX0Xj299QffDj9MRpCj=O0=TBd+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for reviewing and providing the comments Ashutosh.
Please find my thoughts below:

On Fri, Jul 17, 2020 at 7:18 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
wrote:
>
> Some review comments (mostly) from the leader side code changes:
>
> 1) Do we need a DSM key for the FORCE_QUOTE option? I think FORCE_QUOTE
option is only used with COPY TO and not COPY FROM so not sure why you have
added it.
>
> PARALLEL_COPY_KEY_FORCE_QUOTE_LIST
>

Fixed

> 2) Should we be allocating the parallel copy data structure only when it
is confirmed that the parallel copy is allowed?
>
> pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
> cstate->pcdata = pcdata;
>
> Or, if you want it to be allocated before confirming if Parallel copy is
allowed or not, then I think it would be good to allocate it in
*cstate->copycontext* memory context so that when EndCopy is called towards
the end of the COPY FROM operation, the entire context itself gets deleted
thereby freeing the memory space allocated for pcdata. In fact it would be
good to ensure that all the local memory allocated inside the ctstate
structure gets allocated in the *cstate->copycontext* memory context.
>

Fixed

> 3) Should we allow Parallel Copy when the insert method is
CIM_MULTI_CONDITIONAL?
>
> + /* Check if the insertion mode is single. */
> + if (FindInsertMethod(cstate) == CIM_SINGLE)
> + return false;
>
> I know we have added checks in CopyFrom() to ensure that if any trigger
(before row or instead of) is found on any of partition being loaded with
data, then COPY FROM operation would fail, but does it mean that we are
okay to perform parallel copy on partitioned table. Have we done some
performance testing with the partitioned table where the data in the input
file needs to be routed to the different partitions?
>

Partition data is handled like what Amit had told in one of earlier mails
[1]. My colleague Bharath has run performance test with partition table,
he will be sharing the results.

> 4) There are lot of if-checks in IsParallelCopyAllowed function that are
checked in CopyFrom function as well which means in case of Parallel Copy
those checks will get executed multiple times (first by the leader and from
second time onwards by each worker process). Is that required?
>

It is called from BeginParallelCopy, This will be called only once. This
change is ok.

> 5) Should the worker process be calling this function when the leader has
already called it once in ExecBeforeStmtTrigger()?
>
> /* Verify the named relation is a valid target for INSERT */
> CheckValidResultRel(resultRelInfo, CMD_INSERT);
>

Fixed.

> 6) I think it would be good to re-write the comments atop
ParallelCopyLeader(). From the present comments it appears as if you were
trying to put the information pointwise but somehow you ended up putting in
a paragraph. The comments also have some typos like *line beaks* which
possibly means line breaks. This is applicable for other comments as well
where you
>

Fixed.

> 7) Is the following checking equivalent to IsWorker()? If so, it would be
good to replace it with an IsWorker like macro to increase the readability.
>
> (IsParallelCopy() && !IsLeader())
>

Fixed.

These have been fixed and the new patch is attached as part of my previous
mail.
[1] -
https://www.postgresql.org/message-id/CAA4eK1LQPxULxw8JpucX0PwzQQRk%3Dq4jG32cU1us2%2B-mtzZUQg%40mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-22 15:10:10 Re: [PATCH] fix GIN index search sometimes losing results
Previous Message vignesh C 2020-07-22 14:17:57 Re: Parallel copy