Re: Parallel copy

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-07-23 03:20:51
Message-ID: CALj2ACUEvYbad7Gjk8+GOdT2tNNfPbvvqLfdwcBNrPcin6zE_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 22, 2020 at 7:56 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> Thanks for reviewing and providing the comments Ashutosh.
> Please find my thoughts below:
>
> On Fri, Jul 17, 2020 at 7:18 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
wrote:
> >
> > Some review comments (mostly) from the leader side code changes:
> >
> > 3) Should we allow Parallel Copy when the insert method is
CIM_MULTI_CONDITIONAL?
> >
> > + /* Check if the insertion mode is single. */
> > + if (FindInsertMethod(cstate) == CIM_SINGLE)
> > + return false;
> >
> > I know we have added checks in CopyFrom() to ensure that if any trigger
(before row or instead of) is found on any of partition being loaded with
data, then COPY FROM operation would fail, but does it mean that we are
okay to perform parallel copy on partitioned table. Have we done some
performance testing with the partitioned table where the data in the input
file needs to be routed to the different partitions?
> >
>
> Partition data is handled like what Amit had told in one of earlier mails
[1]. My colleague Bharath has run performance test with partition table,
he will be sharing the results.
>

I ran tests for partitioned use cases - results are similar to that of non
partitioned cases[1].

parallel workers test case 1(exec time in sec): copy from csv file, 5.1GB,
10million tuples, 4 range partitions, 3 indexes on integer columns unique
data test case 2(exec time in sec): copy from csv file, 5.1GB, 10million
tuples, 4 range partitions, unique data
0 205.403(1X) 135(1X)
2 114.724(1.79X) 59.388(2.27X)
4 99.017(2.07X) 56.742(2.34X)
8 99.722(2.06X) 66.323(2.03X)
16 98.147(2.09X) 66.054(2.04X)
20 97.723(2.1X) 66.389(2.03X)
30 97.048(2.11X) 70.568(1.91X)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lu, Chenyang 2020-07-23 03:23:47 [PATCH] keep the message consistent in buffile.c
Previous Message Peter Geoghegan 2020-07-23 02:54:26 Re: Default setting for enable_hashagg_disk