Re: Parallel copy

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-09-22 10:38:46
Message-ID: CALj2ACXQPiPhqShCgdPM3_PZqydxPHvefg3E1N5kNBh2jD08mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 17, 2020 at 11:06 AM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Sep 16, 2020 at 1:20 PM Greg Nancarrow <gregn4422(at)gmail(dot)com>
wrote:
> >
> > Fortunately I have been given permission to share the exact table
> > definition and data I used, so you can check the behaviour and timings
> > on your own test machine.
> >
>
> Thanks Greg for the script. I ran your test case and I didn't observe
> any increase in exec time with 1 worker, indeed, we have benefitted a
> few seconds from 0 to 1 worker as expected.
>
> Execution time is in seconds. Each test case is executed 3 times on
> release build. Each time the data directory is recreated.
>
> Case 1: 1000000 rows, 2GB
> With Patch, default configuration, 0 worker: 88.933, 92.261, 88.423
> With Patch, default configuration, 1 worker: 73.825, 74.583, 72.678
>
> With Patch, custom configuration, 0 worker: 76.191, 78.160, 78.822
> With Patch, custom configuration, 1 worker: 61.289, 61.288, 60.573
>
> Case 2: 2550000 rows, 5GB
> With Patch, default configuration, 0 worker: 246.031, 188.323, 216.683
> With Patch, default configuration, 1 worker: 156.299, 153.293, 170.307
>
> With Patch, custom configuration, 0 worker: 197.234, 195.866, 196.049
> With Patch, custom configuration, 1 worker: 157.173, 158.287, 157.090
>

Hi Greg,

If you still observe the issue in your testing environment, I'm attaching a
testing patch(that applies on top of the latest parallel copy patch set
i.e. v5 1 to 6) to capture various timings such as total copy time in
leader and worker, index and table insertion time, leader and worker
waiting time. These logs are shown in the server log file.

Few things to follow before testing:
1. Is the table being dropped/truncated after the test with 0 workers and
before running with 1 worker? If not, then the index insertion time would
increase.[1](for me it is increasing by 10 sec). This is obvious because
the 1st time index will be created from bottom up manner(from leaves to
root), but for the 2nd time it has to search and insert at the proper
leaves and inner B+Tree nodes.
2. If possible, can you also run with custom postgresql.conf settings[2]
along with default? Just to ensure that other bg processes such as
checkpointer, autovacuum, bgwriter etc. don't affect our testcase. For
instance, with default postgresql.conf file, it looks like checkpointing[3]
is happening frequently, could you please let us know if that happens at
your end?
3. Could you please run the test case 3 times at least? Just to ensure the
consistency of the issue.
4. I ran the tests in a performance test system where no other user
processes(except system processes) are running. Is it possible for you to
do the same?

Please capture and share the timing logs with us.

Here's a snapshot of how the added timings show up in the logs: ( I
captured this with your test case case 1: 1000000 rows, 2GB, custom
postgresql.conf file settings[2]).
with 0 workers:
2020-09-22 10:49:27.508 BST [163910] LOG: totaltableinsertiontime =
24072.034 ms
2020-09-22 10:49:27.508 BST [163910] LOG: totalindexinsertiontime = 60.682
ms
2020-09-22 10:49:27.508 BST [163910] LOG: totalcopytime = 59664.594 ms

with 1 worker:
2020-09-22 10:53:58.409 BST [163947] LOG: totalcopyworkerwaitingtime =
59.815 ms
2020-09-22 10:53:58.409 BST [163947] LOG: totaltableinsertiontime =
23585.881 ms
2020-09-22 10:53:58.409 BST [163947] LOG: totalindexinsertiontime = 30.946
ms
2020-09-22 10:53:58.409 BST [163947] LOG: totalcopytimeworker = 47047.956
ms
2020-09-22 10:53:58.429 BST [163946] LOG: totalcopyleaderwaitingtime =
26746.744 ms
2020-09-22 10:53:58.429 BST [163946] LOG: totalcopytime = 47150.002 ms

[1]
0 worker:
LOG: totaltableinsertiontime = 25491.881 ms
LOG: totalindexinsertiontime = 14136.104 ms
LOG: totalcopytime = 75606.858 ms
table is not dropped and so are indexes
1 worker:
LOG: totalcopyworkerwaitingtime = 64.582 ms
LOG: totaltableinsertiontime = 21360.875 ms
LOG: totalindexinsertiontime = 24843.570 ms
LOG: totalcopytimeworker = 69837.162 ms
LOG: totalcopyleaderwaitingtime = 49548.441 ms
LOG: totalcopytime = 69997.778 ms

[2]
custom postgresql.conf configuration:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

[3]
LOG: checkpoints are occurring too frequently (14 seconds apart)
HINT: Consider increasing the configuration parameter "max_wal_size".

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Parallel-Copy-Exec-Time-Capture.patch application/octet-stream 7.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-09-22 11:44:54 Re: Logical replication from PG v13 and below to PG v14 (devel version) is not working.
Previous Message Peter Eisentraut 2020-09-22 09:37:57 Re: OpenSSL 3.0.0 compatibility