Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-07 18:44:00
Message-ID: CALDaNm29DJKy0-vozs8eeBRf2u3rbvPdZHCocrd0VjoWHS7h5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jul 22, 2020 at 7:48 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> >
> > > Review comments:
> > > ===================
> > >
> > > 0001-Copy-code-readjustment-to-support-parallel-copy
> > > 1.
> > > @@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
> > > else
> > > nbytes = 0; /* no data need be saved */
> > >
> > > + if (cstate->copy_dest == COPY_NEW_FE)
> > > + minread = RAW_BUF_SIZE - nbytes;
> > > +
> > > inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
> > > - 1, RAW_BUF_SIZE - nbytes);
> > > + minread, RAW_BUF_SIZE - nbytes);
> > >
> > > No comment to explain why this change is done?
> > >
> > > 0002-Framework-for-leader-worker-in-parallel-copy
> >
> > Currently CopyGetData copies a lesser amount of data to buffer even though space is available in buffer because minread was passed as 1 to CopyGetData. Because of this there are frequent call to CopyGetData for fetching the data. In this case it will load only some data due to the below check:
> > while (maxread > 0 && bytesread < minread && !cstate->reached_eof)
> > After reading some data bytesread will be greater than minread which is passed as 1 and return with lesser amount of data, even though there is some space.
> > This change is required for parallel copy feature as each time we get a new DSM data block which is of 64K size and copy the data. If we copy less data into DSM data blocks we might end up consuming all the DSM data blocks.
> >
>
> Why can't we reuse the DSM block which has unfilled space?
>
> > I felt this issue can be fixed as part of HEAD. Have posted a separate thread [1] for this. I'm planning to remove that change once it gets committed. Can that go as a separate
> > patch or should we include it here?
> > [1] - https://www.postgresql.org/message-id/CALDaNm0v4CjmvSnftYnx_9pOS_dKRG%3DO3NnBgJsQmi0KipvLog%40mail.gmail.com
> >
>
> I am convinced by the reason given by Kyotaro-San in that another
> thread [1] and performance data shown by Peter that this can't be an
> independent improvement and rather in some cases it can do harm. Now,
> if you need it for a parallel-copy path then we can change it
> specifically to the parallel-copy code path but I don't understand
> your reason completely.
>

Whenever we need data to be populated, we will get a new data block &
pass it to CopyGetData to populate the data. In case of file copy, the
server will completely fill the data block. We expect the data to be
filled completely. If data is available it will completely load the
complete data block in case of file copy. There is no scenario where
even if data is present a partial data block will be returned except
for EOF or no data available. But in case of STDIN data copy, even
though there is 8K data available in data block & 8K data available in
STDIN, CopyGetData will return as soon as libpq buffer data is more
than the minread. We will pass new data block every time to load data.
Every time we pass an 8K data block but CopyGetData loads a few bytes
in the new data block & returns. I wanted to keep the same data
population logic for both file copy & STDIN copy i.e copy full 8K data
blocks & then the populated data can be required. There is an
alternative solution I can have some special handling in case of STDIN
wherein the existing data block can be passed with the index from
where the data should be copied. Thoughts?

> > > 2.
> ..
> > > + */
> > > +typedef struct ParallelCopyLineBoundary
> > >
> > > Are we doing all this state management to avoid using locks while
> > > processing lines? If so, I think we can use either spinlock or LWLock
> > > to keep the main patch simple and then provide a later patch to make
> > > it lock-less. This will allow us to first focus on the main design of
> > > the patch rather than trying to make this datastructure processing
> > > lock-less in the best possible way.
> > >
> >
> > The steps will be more or less same if we use spinlock too. step 1, step 3 & step 4 will be common we have to use lock & unlock instead of step 2 & step 5. I feel we can retain the current implementation.
> >
>
> I'll study this in detail and let you know my opinion on the same but
> in the meantime, I don't follow one part of this comment: "If they
> don't follow this order the worker might process wrong line_size and
> leader might populate the information which worker has not yet
> processed or in the process of processing."
>
> Do you want to say that leader might overwrite some information which
> worker hasn't read yet? If so, it is not clear from the comment.
> Another minor point about this comment:
>

Here leader and worker must follow these steps to avoid any corruption
or hang issue. Changed it to:
* The leader & worker process access the shared line information by following
* the below steps to avoid any data corruption or hang:

> + * ParallelCopyLineBoundary is common data structure between leader & worker,
> + * Leader process will be populating data block, data block offset &
> the size of
>
> I think there should be a full-stop after worker instead of a comma.
>

Changed it.

> >
> > > 6.
> > > In function BeginParallelCopy(), you need to keep a provision to
> > > collect wal_usage and buf_usage stats. See _bt_begin_parallel for
> > > reference. Those will be required for pg_stat_statements.
> > >
> >
> > Fixed
> >
>
> How did you ensure that this is fixed? Have you tested it, if so
> please share the test? I see a basic problem with your fix.
>
> + /* Report WAL/buffer usage during parallel execution */
> + bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
> + walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
> + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
> + &walusage[ParallelWorkerNumber]);
>
> You need to call InstrStartParallelQuery() before the actual operation
> starts, without that stats won't be accurate? Also, after calling
> WaitForParallelWorkersToFinish(), you need to accumulate the stats
> collected from workers which neither you have done nor is possible
> with the current code in your patch because you haven't made any
> provision to capture them in BeginParallelCopy.
>
> I suggest you look into lazy_parallel_vacuum_indexes() and
> begin_parallel_vacuum() to understand how the buffer/wal usage stats
> are accumulated. Also, please test this functionality using
> pg_stat_statements.
>

Made changes accordingly.
I have verified it using:
postgres=# select * from pg_stat_statements where query like '%copy%';
userid | dbid | queryid |
query
| plans | total_plan_time |
min_plan_time | max_plan_time | mean_plan_time | stddev_plan_time |
calls | total_exec_time | min_exec_time | max_exec_time |
mean_exec_time | stddev_exec_time | rows | shared_blks_hi
t | shared_blks_read | shared_blks_dirtied | shared_blks_written |
local_blks_hit | local_blks_read | local_blks_dirtied |
local_blks_written | temp_blks_read | temp_blks_written | blk_
read_time | blk_write_time | wal_records | wal_fpi | wal_bytes
--------+-------+----------------------+---------------------------------------------------------------------------------------------------------------------+-------+-----------------+-
--------------+---------------+----------------+------------------+-------+-----------------+---------------+---------------+----------------+------------------+--------+---------------
--+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+-----
----------+----------------+-------------+---------+-----------
10 | 13743 | -6947756673093447609 | copy hw from
'/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
csv, delimiter ',') | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 265.195105 | 265.195105 | 265.195105 | 265.195105
| 0 | 175000 | 191
6 | 0 | 946 | 946 |
0 | 0 | 0 | 0
| 0 | 0 |
0 | 0 | 1116 | 0 | 3587203
10 | 13743 | 8570215596364326047 | copy hw from
'/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
csv, delimiter ',', parallel '2') | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 35668.402482 | 35668.402482 | 35668.402482 | 35668.402482
| 0 | 175000 | 310
1 | 36 | 952 | 919 |
0 | 0 | 0 | 0
| 0 | 0 |
0 | 0 | 1119 | 6 | 3624405
(2 rows)

> >
> > > 0003-Allow-copy-from-command-to-process-data-from-file-ST
> > > 10.
> > > In the commit message, you have written "The leader does not
> > > participate in the insertion of data, leaders only responsibility will
> > > be to identify the lines as fast as possible for the workers to do the
> > > actual copy operation. The leader waits till all the lines populated
> > > are processed by the workers and exits."
> > >
> > > I think you should also mention that we have chosen this design based
> > > on the reason "that everything stalls if the leader doesn't accept
> > > further input data, as well as when there are no available splitted
> > > chunks so it doesn't seem like a good idea to have the leader do other
> > > work. This is backed by the performance data where we have seen that
> > > with 1 worker there is just a 5-10% (or whatever percentage difference
> > > you have seen) performance difference)".
> >
> > Fixed.
> >
>
> Make it a one-paragraph starting from "The leader does not participate
> in the insertion of data .... just a 5-10% performance difference".
> Right now both the parts look a bit disconnected.
>

Made the contents starting from "The leader does not" in a paragraph.

> Few additional comments:
> ======================
> v5-0001-Copy-code-readjustment-to-support-parallel-copy
> ---------------------------------------------------------------------------------
> 1.
> +/*
> + * CLEAR_EOL_LINE - Wrapper for clearing EOL.
> + */
> +#define CLEAR_EOL_LINE() \
> +if (!result && !IsHeaderLine()) \
> + ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
> + cstate->line_buf.len, \
> + &cstate->line_buf.len) \
>
> I don't like this macro. I think it is sufficient to move the common
> code to be called from the parallel and non-parallel path in
> ClearEOLFromCopiedData but I think the other checks can be done
> in-place. I think having macros for such a thing makes code less
> readable.
>

I have removed the macro & called ClearEOLFromCopiedData directly
wherever required.

> 2.
> -
> +static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
> + List *attnamelist);
>
> Spurious line removal.
>

I have modified it to keep it as it is.

> v5-0002-Framework-for-leader-worker-in-parallel-copy
> ---------------------------------------------------------------------------
> 3.
> + FullTransactionId full_transaction_id; /* xid for copy from statement */
> + CommandId mycid; /* command id */
> + ParallelCopyLineBoundaries line_boundaries; /* line array */
> +} ParallelCopyShmInfo;
>
> We already serialize FullTransactionId and CommandId via
> InitializeParallelDSM->SerializeTransactionState. Can't we reuse it? I
> think recently Parallel Insert patch has also done something for this
> [2] so you can refer that if you want.
>

Changed it to remove setting of command id & full transaction id.
Added a function SetCurrentCommandIdUsedForWorker to set
currentCommandIdUsed to true & called GetCurrentCommandId by passing
!IsParallelCopy().

> v5-0004-Documentation-for-parallel-copy
> -----------------------------------------------------------
> 1. Perform <command>COPY FROM</command> in parallel using <replaceable
> + class="parameter"> integer</replaceable> background workers.
>
> No need for space before integer.
>

I have removed it.

Attached v6 patch with the fixes.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v6-0001-Copy-code-readjustment-to-support-parallel-copy.patch application/x-patch 16.3 KB
v6-0002-Framework-for-leader-worker-in-parallel-copy.patch application/x-patch 29.4 KB
v6-0003-Allow-copy-from-command-to-process-data-from-file.patch application/x-patch 63.8 KB
v6-0004-Documentation-for-parallel-copy.patch application/x-patch 2.7 KB
v6-0005-Tests-for-parallel-copy.patch application/x-patch 19.7 KB
v6-0006-Parallel-Copy-For-Binary-Format-Files.patch application/x-patch 27.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-10-07 18:48:42 Re: Parallel copy
Previous Message Robert Haas 2020-10-07 17:48:49 Re: [Patch] ALTER SYSTEM READ ONLY