Re: Parallel copy

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-09 09:22:22
Message-ID: CALj2ACWwSEELvf9H2EsuDS8=mfYmaODCJhGJLfCuBXxu6076Zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> From the testing perspective,
> 1. Test by having something force_parallel_mode = regress which means
> that all existing Copy tests in the regression will be executed via
> new worker code. You can have this as a test-only patch for now and
> make sure all existing tests passed with this.
>

I don't think all the existing copy test cases(except the new test cases
added in the parallel copy patch set) would run inside the parallel worker
if force_parallel_mode is on. This is because, the parallelism will be
picked up for parallel copy only if parallel option is specified unlike
parallelism for select queries.

Anyways, I ran with force_parallel_mode on and regress. All copy related
tests and make check/make check-world ran fine.

>
> 2. Do we have tests for toast tables? I think if you implement the
> previous point some existing tests might cover it but I feel we should
> have at least one or two tests for the same.
>

Toast table use case 1: 10000 tuples, 9.6GB data, 3 indexes 2 on integer
columns, 1 on text column(not the toast column), csv file, each row is >
1320KB:
(222.767, 0, 1X), (134.171, 1, 1.66X), (93.749, 2, 2.38X), (93.672, 4,
2.38X), (94.827, 8, 2.35X), (93.766, 16, 2.37X), (98.153, 20, 2.27X),
(122.721, 30, 1.81X)

Toast table use case 2: 100000 tuples, 96GB data, 3 indexes 2 on integer
columns, 1 on text column(not the toast column), csv file, each row is >
1320KB:
(2255.032, 0, 1X), (1358.628, 1, 1.66X), (901.170, 2, 2.5X), (912.743, 4,
2.47X), (988.718, 8, 2.28X), (938.000, 16, 2.4X), (997.556, 20, 2.26X),
(1000.586, 30, 2.25X)

Toast table use case3: 10000 tuples, 9.6GB, no indexes, binary file, each
row is > 1320KB:
(136.983, 0, 1X), (136.418, 1, 1X), (81.896, 2, 1.66X), (62.929, 4, 2.16X),
(52.311, 8, 2.6X), (40.032, 16, 3.49X), (44.097, 20, 3.09X), (62.310, 30,
2.18X)

In the case of a Toast table, we could achieve upto 2.5X for csv files, and
3.5X for binary files. We are analyzing this point and will post an update
on our findings soon.

While testing for the Toast table case with a binary file, I discovered an
issue with the earlier v6-0006-Parallel-Copy-For-Binary-Format-Files.patch
from [1], I fixed it and added the updated v6-0006 patch here. Please note
that I'm also attaching the 1 to 5 patches from version 6 just for
completion, that have no change from what Vignesh sent earlier in [1].

>
> 3. Have we checked the code coverage of the newly added code with
> existing tests?
>

So far, we manually ensured that most of the code parts are covered(see
below list of test cases). But we are also planning to do the code coverage
using some tool in the coming days.

Apart from the above tests, I also captured performance measurement on the
latest v6 patch set.

Use case 1: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, csv file
(1168.484, 0, 1X), (1116.442, 1, 1.05X), (641.272, 2, 1.82X), (338.963, 4,
3.45X), (202.914, 8, 5.76X), (139.884, 16, 8.35X), (128.955, 20, 9.06X),
(131.898, 30, 8.86X)

Use case 2: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, binary file
(1097.83, 0, 1X), (1095.735, 1, 1.002X), (625.610, 2, 1.75X), (319.833, 4,
3.43X), (186.908, 8, 5.87X), (132.115, 16, 8.31X), (128.854, 20, 8.52X),
(134.965, 30, 8.13X)

Use case 2: 10million rows, 5.2GB data, 3 indexes on integer columns, csv
file
(218.227, 0, 1X), (182.815, 1, 1.19X), (135.500, 2, 1.61), (113.954, 4,
1.91X), (106.243, 8, 2.05X), (101.222, 16, 2.15X), (100.378, 20, 2.17X),
(100.351, 30, 2.17X)

All the above tests are performed on the latest v6 patch set (attached here
in this thread) with custom postgresql.conf[1]. The results are of the
triplet form (exec time in sec, number of workers, gain)

Overall, we have below test cases to cover the code and for performance
measurements. We plan to run these tests whenever a new set of patches is
posted.

1. csv
2. binary
3. force parallel mode = regress
4. toast data csv and binary
5. foreign key check, before row, after row, before statement, after
statement, instead of triggers
6. partition case
7. foreign partitions and partitions having trigger cases
8. where clause having parallel unsafe and safe expression, default
parallel unsafe and safe expression
9. temp, global, local, unlogged, inherited tables cases, foreign tables

[1]
https://www.postgresql.org/message-id/CALDaNm29DJKy0-vozs8eeBRf2u3rbvPdZHCocrd0VjoWHS7h5A%40mail.gmail.com
[2]
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v6-0001-Copy-code-readjustment-to-support-parallel-copy.patch application/octet-stream 16.3 KB
v6-0002-Framework-for-leader-worker-in-parallel-copy.patch application/octet-stream 29.4 KB
v6-0003-Allow-copy-from-command-to-process-data-from-file.patch application/octet-stream 63.8 KB
v6-0004-Documentation-for-parallel-copy.patch application/octet-stream 2.7 KB
v6-0005-Tests-for-parallel-copy.patch application/octet-stream 19.7 KB
v6-0006-Parallel-Copy-For-Binary-Format-Files.patch application/octet-stream 27.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2020-10-09 09:28:25 powerpc pg_atomic_compare_exchange_u32_impl: error: comparison of integer expressions of different signedness (Re: pgsql: For all ppc compilers, implement compare_exchange and) fetch_add
Previous Message Amit Kapila 2020-10-09 09:09:29 Re: Parallel INSERT (INTO ... SELECT ...)