Quick Links

Re: Parallel copy

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Parallel copy
Date:	2020-06-17 04:10:09
Message-ID:	CALDaNm1xd3k_471-M4yYE5Xvf-z3cn0b1Qc=pOtYmDbkYgfqig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I have included tests for parallel copy feature & few bugs that were
identified during testing have been fixed. Attached patches for the
same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

On Tue, Jun 16, 2020 at 3:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Jun 15, 2020 at 7:41 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Thanks Amit for the clarifications. Regarding partitioned table, one of the question was - if we are loading data into a partitioned table using COPY command, then the input file would contain tuples for different tables (partitions) unlike the normal table case where all the tuples in the input file would belong to the same table. So, in such a case, how are we going to accumulate tuples into the DSM? I mean will the leader process check which tuple needs to be routed to which partition and accordingly accumulate them into the DSM. For e.g. let's say in the input data file we have 10 tuples where the 1st tuple belongs to partition1, 2nd belongs to partition2 and likewise. So, in such cases, will the leader process accumulate all the tuples belonging to partition1 into one DSM and tuples belonging to partition2 into some other DSM and assign them to the worker process or we have taken some other approach to handle this scenario?
> >
>
> No, all the tuples (for all partitions) will be accumulated in a
> single DSM and the workers/leader will route the tuple to an
> appropriate partition.
>
> > Further, I haven't got much time to look into the links that you have shared in your previous response. Will have a look into those and will also slowly start looking into the patches as and when I get some time. Thank you.
> >
>
> Yeah, it will be good if you go through all the emails once because
> most of the decisions (and design) in the patch is supposed to be
> based on the discussion in this thread.
>
> Note - Please don't top post, try to give inline replies.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
0005-Tests-for-parallel-copy.patch	text/x-patch	20.3 KB
0004-Documentation-for-parallel-copy.patch	text/x-patch	2.0 KB
0001-Copy-code-readjustment-to-support-parallel-copy.patch	text/x-patch	16.8 KB
0002-Framework-for-leader-worker-in-parallel-copy.patch	text/x-patch	32.5 KB
0003-Allow-copy-from-command-to-process-data-from-file-ST.patch	text/x-patch	40.4 KB
0006-Parallel-Copy-For-Binary-Format-Files.patch	text/x-patch	25.9 KB

In response to

Re: Parallel copy at 2020-06-16 09:51:43 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dilip Kumar	2020-06-17 04:39:33	Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message	Amit Kapila	2020-06-17 04:02:49	Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions