Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-06-17 04:10:09
Message-ID: CALDaNm1xd3k_471-M4yYE5Xvf-z3cn0b1Qc=pOtYmDbkYgfqig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I have included tests for parallel copy feature & few bugs that were
identified during testing have been fixed. Attached patches for the
same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

On Tue, Jun 16, 2020 at 3:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Jun 15, 2020 at 7:41 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Thanks Amit for the clarifications. Regarding partitioned table, one of the question was - if we are loading data into a partitioned table using COPY command, then the input file would contain tuples for different tables (partitions) unlike the normal table case where all the tuples in the input file would belong to the same table. So, in such a case, how are we going to accumulate tuples into the DSM? I mean will the leader process check which tuple needs to be routed to which partition and accordingly accumulate them into the DSM. For e.g. let's say in the input data file we have 10 tuples where the 1st tuple belongs to partition1, 2nd belongs to partition2 and likewise. So, in such cases, will the leader process accumulate all the tuples belonging to partition1 into one DSM and tuples belonging to partition2 into some other DSM and assign them to the worker process or we have taken some other approach to handle this scenario?
> >
>
> No, all the tuples (for all partitions) will be accumulated in a
> single DSM and the workers/leader will route the tuple to an
> appropriate partition.
>
> > Further, I haven't got much time to look into the links that you have shared in your previous response. Will have a look into those and will also slowly start looking into the patches as and when I get some time. Thank you.
> >
>
> Yeah, it will be good if you go through all the emails once because
> most of the decisions (and design) in the patch is supposed to be
> based on the discussion in this thread.
>
> Note - Please don't top post, try to give inline replies.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0005-Tests-for-parallel-copy.patch text/x-patch 20.3 KB
0004-Documentation-for-parallel-copy.patch text/x-patch 2.0 KB
0001-Copy-code-readjustment-to-support-parallel-copy.patch text/x-patch 16.8 KB
0002-Framework-for-leader-worker-in-parallel-copy.patch text/x-patch 32.5 KB
0003-Allow-copy-from-command-to-process-data-from-file-ST.patch text/x-patch 40.4 KB
0006-Parallel-Copy-For-Binary-Format-Files.patch text/x-patch 25.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-06-17 04:39:33 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Amit Kapila 2020-06-17 04:02:49 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions