Re: Parallel copy

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-14 11:35:07
Message-ID: CALj2ACWeQVd-xoQZHGT01_33St4xPoZQibWz46o7jW1PE3XOqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I did performance testing on v7 patch set[1] with custom
postgresql.conf[2]. The results are of the triplet form (exec time in
sec, number of workers, gain)

Use case 1: 10million rows, 5.2GB data, 2 indexes on integer columns,
1 index on text column, binary file
(1104.898, 0, 1X), (1112.221, 1, 1X), (640.236, 2, 1.72X), (335.090,
4, 3.3X), (200.492, 8, 5.51X), (131.448, 16, 8.4X), (121.832, 20,
9.1X), (124.287, 30, 8.9X)

Use case 2: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, copy from stdin, csv format
(1203.282, 0, 1X), (1135.517, 1, 1.06X), (655.140, 2, 1.84X),
(343.688, 4, 3.5X), (203.742, 8, 5.9X), (144.793, 16, 8.31X),
(133.339, 20, 9.02X), (136.672, 30, 8.8X)

Use case 3: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, text file
(1165.991, 0, 1X), (1128.599, 1, 1.03X), (644.793, 2, 1.81X),
(342.813, 4, 3.4X), (204.279, 8, 5.71X), (139.986, 16, 8.33X),
(128.259, 20, 9.1X), (132.764, 30, 8.78X)

Above results are similar to the results with earlier versions of the patch set.

On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Sure, you need to change the code such that when force_parallel_mode =
> 'regress' is specified then it always uses one worker. This is
> primarily for testing purposes and will help during the development of
> this patch as it will make all exiting Copy tests to use quite a good
> portion of the parallel infrastructure.
>

I performed force_parallel_mode = regress testing and found 2 issues,
the fixes for the same are available in v7 patch set[1].

>
> > Overall, we have below test cases to cover the code and for performance measurements. We plan to run these tests whenever a new set of patches is posted.
> >
> > 1. csv
> > 2. binary
>
> Don't we need the tests for plain text files as well?
>

I added a text use case and above mentioned are perf results on v7 patch set[1].

>
> > 3. force parallel mode = regress
> > 4. toast data csv and binary
> > 5. foreign key check, before row, after row, before statement, after statement, instead of triggers
> > 6. partition case
> > 7. foreign partitions and partitions having trigger cases
> > 8. where clause having parallel unsafe and safe expression, default parallel unsafe and safe expression
> > 9. temp, global, local, unlogged, inherited tables cases, foreign tables
> >
>
> Sounds like good coverage. So, are you doing all this testing
> manually? How are you maintaining these tests?
>

All test cases listed above, except for the cases that are meant to
measure perf gain with huge data, are present in v7-0005 patch in v7
patch set[1].

[1] https://www.postgresql.org/message-id/CALDaNm1n1xW43neXSGs%3Dc7zt-mj%2BJHHbubWBVDYT9NfCoF8TuQ%40mail.gmail.com

[2]
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-10-14 12:02:29 RelationGetNumberOfBlocks is called every time of heap_rescan.
Previous Message Dilip Kumar 2020-10-14 11:21:24 Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables