Re: Parallel copy

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-07-23 04:51:12
Message-ID: CAE9k0PkY1cT2Ax9B4TrYHCPw_YNibWJQ0wBNiPDTXpQ0_aXS0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think, when doing the performance testing for partitioned table, it would
be good to also mention about the distribution of data in the input file.
One possible data distribution could be that we have let's say 100 tuples
in the input file, and every consecutive tuple belongs to a different
partition.

On Thu, Jul 23, 2020 at 8:51 AM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Wed, Jul 22, 2020 at 7:56 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > Thanks for reviewing and providing the comments Ashutosh.
> > Please find my thoughts below:
> >
> > On Fri, Jul 17, 2020 at 7:18 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
> wrote:
> > >
> > > Some review comments (mostly) from the leader side code changes:
> > >
> > > 3) Should we allow Parallel Copy when the insert method is
> CIM_MULTI_CONDITIONAL?
> > >
> > > + /* Check if the insertion mode is single. */
> > > + if (FindInsertMethod(cstate) == CIM_SINGLE)
> > > + return false;
> > >
> > > I know we have added checks in CopyFrom() to ensure that if any
> trigger (before row or instead of) is found on any of partition being
> loaded with data, then COPY FROM operation would fail, but does it mean
> that we are okay to perform parallel copy on partitioned table. Have we
> done some performance testing with the partitioned table where the data in
> the input file needs to be routed to the different partitions?
> > >
> >
> > Partition data is handled like what Amit had told in one of earlier
> mails [1]. My colleague Bharath has run performance test with partition
> table, he will be sharing the results.
> >
>
> I ran tests for partitioned use cases - results are similar to that of non
> partitioned cases[1].
>
> parallel workers test case 1(exec time in sec): copy from csv file,
> 5.1GB, 10million tuples, 4 range partitions, 3 indexes on integer columns
> unique data test case 2(exec time in sec): copy from csv file, 5.1GB,
> 10million tuples, 4 range partitions, unique data
> 0 205.403(1X) 135(1X)
> 2 114.724(1.79X) 59.388(2.27X)
> 4 99.017(2.07X) 56.742(2.34X)
> 8 99.722(2.06X) 66.323(2.03X)
> 16 98.147(2.09X) 66.054(2.04X)
> 20 97.723(2.1X) 66.389(2.03X)
> 30 97.048(2.11X) 70.568(1.91X)
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-07-23 06:01:58 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message tsunakawa.takay@fujitsu.com 2020-07-23 04:46:30 RE: Global snapshots