Quick Links

Re: Parallel copy

From:	Greg Nancarrow <gregn4422(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject:	Re: Parallel copy
Date:	2020-09-03 06:50:31
Message-ID:	CAJcOf-fea8MrCV+2M=w8jMjp3L7UwJWtEyFsRXV5-gMHxr89CQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>On Wed, Sep 2, 2020 at 3:40 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> I have attached the scripts that I used for the test results I
> mentioned in my previous mail. create.sql file has the table that I
> used, insert_data_gen.txt has the insert data generation scripts. I
> varied the count in insert_data_gen to generate csv files of 1GB, 2GB
> & 5GB & varied the data to generate 1 char, 10 char & 100 char for
> each column for various testing. You can rename insert_data_gen.txt to
> insert_data_gen.sh & generate the csv file.

Hi Vignesh,

I used your script and table definition, multiplying the number of
records to produce a 5GB and 9.5GB CSV file.
I got the following results:

(1) Postgres default settings, 5GB CSV (530000 rows):

Copy Type Duration (s) Load factor
===============================================
Normal Copy 132.197 -

Parallel Copy
(#workers)
1 98.428 1.34
2 52.753 2.51
3 37.630 3.51
4 33.554 3.94
5 33.636 3.93
6 33.821 3.91
7 34.270 3.86
8 34.465 3.84
9 34.315 3.85
10 33.543 3.94

(2) Postgres increased resources, 5GB CSV (530000 rows):

shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB
work_mem = 10% of RAM = 38GB
maintenance_work_mem = 10% of RAM = 38GB
max_worker_processes = 16
max_parallel_workers = 16
checkpoint_timeout = 30min
max_wal_size=2GB

Copy Type Duration (s) Load factor
===============================================
Normal Copy 131.835 -

Parallel Copy
(#workers)
1 98.301 1.34
2 53.261 2.48
3 37.868 3.48
4 34.224 3.85
5 33.831 3.90
6 34.229 3.85
7 34.512 3.82
8 34.303 3.84
9 34.690 3.80
10 34.479 3.82

(3) Postgres default settings, 9.5GB CSV (1000000 rows):

Copy Type Duration (s) Load factor
===============================================
Normal Copy 248.503 -

Parallel Copy
(#workers)
1 185.724 1.34
2 99.832 2.49
3 70.560 3.52
4 63.328 3.92
5 63.182 3.93
6 64.108 3.88
7 64.131 3.87
8 64.350 3.86
9 64.293 3.87
10 63.818 3.89

(4) Postgres increased resources, 9.5GB CSV (1000000 rows):

Copy Type Duration (s) Load factor
===============================================
Normal Copy 248.647 -

Parallel Copy
(#workers)
1 182.236 1.36
2 92.814 2.68
3 67.347 3.69
4 63.839 3.89
5 62.672 3.97
6 63.873 3.89
7 64.930 3.83
8 63.885 3.89
9 62.397 3.98
10 64.477 3.86

So as you found, with this particular table definition and data, 1
parallel worker always performs better than normal copy.
The different result obtained for this particular case seems to be
caused by the following factors:
- different table definition (I used a variety of column types)
- amount of data per row (I used less data per row, so more rows per
same size data file)

As I previously observed, if the target table has no indexes,
increasing resources beyond the default settings makes little
difference to the performance.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Re: Parallel copy at 2020-09-02 05:39:55 from vignesh C

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2020-09-03 07:05:05	Re: New statistics for tuning WAL buffer size
Previous Message	torikoshia	2020-09-03 06:40:00	Re: Get memory contexts of an arbitrary backend process