Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-08-01 04:24:54
Message-ID: CALDaNm2EYd67r3NwScaFh9_onbX_vpKJVS9p-=+TX22q47m+Zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The patches were not applying because of the recent commits.
I have rebased the patch over head & attached.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

On Thu, Jul 23, 2020 at 6:07 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Thu, Jul 23, 2020 at 9:22 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> >>
> >> I ran tests for partitioned use cases - results are similar to that of
> non partitioned cases[1].
> >
> >
> > I could see the gain up to 10-11 times for non-partitioned cases [1],
> can we use similar test case here as well (with one of the indexes on text
> column or having gist index) to see its impact?
> >
> > [1] -
> https://www.postgresql.org/message-id/CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg%40mail.gmail.com
> >
>
> Thanks Amit! Please find the results of detailed testing done for
> partitioned use cases:
>
> Range Partitions: consecutive rows go into the same partitions.
> parallel workers test case 1(exec time in sec): copy from csv file, 2
> indexes on integer columns and 1 index on text column, 4 range partitions test
> case 2(exec time in sec): copy from csv file, 1 gist index on text column,
> 4 range partitions test case 3(exec time in sec): copy from csv file, 3
> indexes on integer columns, 4 range partitions
> 0 1051.924(1X) 785.052(1X) 205.403(1X)
> 2 589.576(1.78X) 421.974(1.86X) 114.724(1.79X)
> 4 321.960(3.27X) 230.997(3.4X) 99.017(2.07X)
> 8 199.245(5.23X) *156.132(5.02X)* 99.722(2.06X)
> 16 127.343(8.26X) 173.696(4.52X) 98.147(2.09X)
> 20 *122.029(8.62X)* 186.418(4.21X) 97.723(2.1X)
> 30 142.876(7.36X) 214.598(3.66X) *97.048(2.11X)*
>
> On Thu, Jul 23, 2020 at 10:21 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
> wrote:
> >
> > I think, when doing the performance testing for partitioned table, it
> would be good to also mention about the distribution of data in the input
> file. One possible data distribution could be that we have let's say 100
> tuples in the input file, and every consecutive tuple belongs to a
> different partition.
> >
>
> To address Ashutosh's point, I used hash partitioning. Hope this helps to
> clear the doubt.
>
> Hash Partitions: where there are high chances that consecutive rows may go
> into different partitions.
> parallel workers test case 1(exec time in sec): copy from csv file, 2
> indexes on integer columns and 1 index on text column, 4 hash partitions test
> case 2(exec time in sec): copy from csv file, 1 gist index on text column,
> 4 hash partitions test case 3(exec time in sec): copy from csv file, 3
> indexes on integer columns, 4 hash partitions
> 0 1060.884(1X) 812.283(1X) 207.745(1X)
> 2 572.542(1.85X) 418.454(1.94X) 107.850(1.93X)
> 4 298.132(3.56X) 227.367(3.57X) *83.895(2.48X)*
> 8 169.449(6.26X) 137.993(5.89X) 85.411(2.43X)
> 16 112.297(9.45X) 95.167(8.53X) 96.136(2.16X)
> 20 *101.546(10.45X)* *90.552(8.97X)* 97.066(2.14X)
> 30 113.877(9.32X) 127.17(6.38X) 96.819(2.14X)
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

Attachment Content-Type Size
v2-0001-Copy-code-readjustment-to-support-parallel-copy.patch text/x-patch 16.6 KB
v2-0002-Framework-for-leader-worker-in-parallel-copy.patch text/x-patch 31.5 KB
v2-0003-Allow-copy-from-command-to-process-data-from-file.patch text/x-patch 43.0 KB
v2-0004-Documentation-for-parallel-copy.patch text/x-patch 2.0 KB
v2-0005-Tests-for-parallel-copy.patch text/x-patch 19.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-08-01 06:17:21 Re: display offset along with block number in vacuum errors
Previous Message Justin Pryzby 2020-08-01 03:31:13 Re: expose parallel leader in CSV and log_line_prefix