Re: Parallel copy

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-07-23 12:37:14
Message-ID: CALj2ACV+f1Ph9m7u9pFJeeSM+shnpycNgU8GDL6O_OOtosHW4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 23, 2020 at 9:22 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
>>
>> I ran tests for partitioned use cases - results are similar to that of
non partitioned cases[1].
>
>
> I could see the gain up to 10-11 times for non-partitioned cases [1], can
we use similar test case here as well (with one of the indexes on text
column or having gist index) to see its impact?
>
> [1] -
https://www.postgresql.org/message-id/CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg%40mail.gmail.com
>

Thanks Amit! Please find the results of detailed testing done for
partitioned use cases:

Range Partitions: consecutive rows go into the same partitions.
parallel workers test case 1(exec time in sec): copy from csv file, 2
indexes on integer columns and 1 index on text column, 4 range partitions test
case 2(exec time in sec): copy from csv file, 1 gist index on text column,
4 range partitions test case 3(exec time in sec): copy from csv file, 3
indexes on integer columns, 4 range partitions
0 1051.924(1X) 785.052(1X) 205.403(1X)
2 589.576(1.78X) 421.974(1.86X) 114.724(1.79X)
4 321.960(3.27X) 230.997(3.4X) 99.017(2.07X)
8 199.245(5.23X) *156.132(5.02X)* 99.722(2.06X)
16 127.343(8.26X) 173.696(4.52X) 98.147(2.09X)
20 *122.029(8.62X)* 186.418(4.21X) 97.723(2.1X)
30 142.876(7.36X) 214.598(3.66X) *97.048(2.11X)*

On Thu, Jul 23, 2020 at 10:21 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
wrote:
>
> I think, when doing the performance testing for partitioned table, it
would be good to also mention about the distribution of data in the input
file. One possible data distribution could be that we have let's say 100
tuples in the input file, and every consecutive tuple belongs to a
different partition.
>

To address Ashutosh's point, I used hash partitioning. Hope this helps to
clear the doubt.

Hash Partitions: where there are high chances that consecutive rows may go
into different partitions.
parallel workers test case 1(exec time in sec): copy from csv file, 2
indexes on integer columns and 1 index on text column, 4 hash partitions test
case 2(exec time in sec): copy from csv file, 1 gist index on text column,
4 hash partitions test case 3(exec time in sec): copy from csv file, 3
indexes on integer columns, 4 hash partitions
0 1060.884(1X) 812.283(1X) 207.745(1X)
2 572.542(1.85X) 418.454(1.94X) 107.850(1.93X)
4 298.132(3.56X) 227.367(3.57X) *83.895(2.48X)*
8 169.449(6.26X) 137.993(5.89X) 85.411(2.43X)
16 112.297(9.45X) 95.167(8.53X) 96.136(2.16X)
20 *101.546(10.45X)* *90.552(8.97X)* 97.066(2.14X)
30 113.877(9.32X) 127.17(6.38X) 96.819(2.14X)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ahsan Hadi 2020-07-23 13:16:02 Re: Add session statistics to pg_stat_database
Previous Message Konstantin Knizhnik 2020-07-23 12:35:42 Re: Why it is not possible to create custom AM which behaves similar to btree?