Re: Parallel copy

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-06-24 08:10:29
Message-ID: CALj2ACW94icER3WrWapon7JkcX8j0TGRue5ycWMTEvgA3X7fOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks Vignesh for reviewing parallel copy for binary format files
patch. I tried to address the comments in the attached patch
(0006-Parallel-Copy-For-Binary-Format-Files.patch).

On Thu, Jun 18, 2020 at 6:42 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, Jun 15, 2020 at 4:39 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > The above tests were run with the configuration attached config.txt, which is the same used for performance tests of csv/text files posted earlier in this mail chain.
> >
> > Request the community to take this patch up for review along with the parallel copy for csv/text file patches and provide feedback.
> >
>
> I had reviewed the patch, few comments:
>
> The new members added should be present in ParallelCopyData

Added to ParallelCopyData.

>
> line_size can be set as and when we process the tuple from
> CopyReadBinaryTupleLeader and this can be set at the end. That way the
> above code can be removed.
>

curr_tuple_start_info and curr_tuple_end_info variables are now local
variables to CopyReadBinaryTupleLeader and the line size calculation
code is moved to CopyReadBinaryAttributeLeader.

>
> curr_block_pos variable is present in ParallelCopyShmInfo, we could
> use it and remove from here.
> curr_data_offset, similar variable raw_buf_index is present in
> CopyStateData, we could use it and remove from here.
>

Yes, making use of them now.

>
> This code is duplicate in CopyReadBinaryTupleLeader &
> CopyReadBinaryAttributeLeader. We could make a function and re-use.
>

Added a new function AdjustFieldInfo.

>
> column_no is not used, it can be removed
>

Removed.

>
> The above code is present in NextCopyFrom & CopyReadBinaryTupleLeader,
> check if we can make a common function or we could use NextCopyFrom as
> it is.
>

Added a macro CHECK_FIELD_COUNT.

> + if (fld_count == -1)
> + {
> + return true;
> + }
>
> Should this be an assert in CopyReadBinaryTupleWorker function as this
> check is already done in the leader.
>

This check in leader signifies the end of the file. For the workers,
the eof is when GetLinePosition() returns -1.
line_pos = GetLinePosition(cstate);
if (line_pos == -1)
return true;
In case the if (fld_count == -1) is encountered in the worker, workers
should just return true from CopyReadBinaryTupleWorker marking eof.
Having this as an assert doesn't serve the purpose I feel.

Along with the review comments addressed
patch(0006-Parallel-Copy-For-Binary-Format-Files.patch) also attaching
all other latest series of patches(0001 to 0005) from [1], the order
of applying patches is from 0001 to 0006.

[1] https://www.postgresql.org/message-id/CALDaNm0H3N9gK7CMheoaXkO99g%3DuAPA93nSZXu0xDarPyPY6sg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0006-Parallel-Copy-For-Binary-Format-Files.patch application/octet-stream 24.7 KB
0001-Copy-code-readjustment-to-support-parallel-copy.patch application/octet-stream 16.8 KB
0002-Framework-for-leader-worker-in-parallel-copy.patch application/octet-stream 33.1 KB
0003-Allow-copy-from-command-to-process-data-from-file-ST.patch application/octet-stream 40.4 KB
0004-Documentation-for-parallel-copy.patch application/octet-stream 2.0 KB
0005-Tests-for-parallel-copy.patch application/octet-stream 20.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2020-06-24 08:12:38 Re: Allow CURRENT_ROLE in GRANTED BY
Previous Message Peter Eisentraut 2020-06-24 08:00:00 Re: Improve handling of parameter differences in physical replication