Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-08 05:45:15
Message-ID: CALDaNm23aWceUF5YM1ituvALTFwVPKoEnAAGbKbUdtDVdyb3Cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > Few additional comments:
> > ======================
>
> Some more comments:
>
> v5-0002-Framework-for-leader-worker-in-parallel-copy
> ===========================================
> 1.
> These values
> + * help in handover of multiple records with significant size of data to be
> + * processed by each of the workers to make sure there is no context
> switch & the
> + * work is fairly distributed among the workers.
>
> How about writing it as: "These values help in the handover of
> multiple records with the significant size of data to be processed by
> each of the workers. This also ensures there is no context switch and
> the work is fairly distributed among the workers."

Changed as suggested.

>
> 2. Can we keep WORKER_CHUNK_COUNT, MAX_BLOCKS_COUNT, and RINGSIZE as
> power-of-two? Say WORKER_CHUNK_COUNT as 64, MAX_BLOCK_COUNT as 1024,
> and accordingly choose RINGSIZE. At many places, we do that way. I
> think it can sometimes help in faster processing due to cache size
> requirements and in this case, I don't see a reason why we can't
> choose these values to be power-of-two. If you agree with this change
> then also do some performance testing after this change?
>

Modified as suggested, Have checked few performance tests & verified
there is no degradation. We will post a performance run of this
separately in the coming days..

> 3.
> + bool curr_blk_completed;
> + char data[DATA_BLOCK_SIZE]; /* data read from file */
> + uint8 skip_bytes;
> +} ParallelCopyDataBlock;
>
> Is there a reason to keep skip_bytes after data? Normally the variable
> size data is at the end of the structure. Also, there is no comment
> explaining the purpose of skip_bytes.
>

Modified as suggested and added comments.

> 4.
> + * Copy data block information.
> + * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
> + * copied in these DSM data blocks. The leader process identifies the records
> + * and the record information will be shared to the workers. The workers will
> + * insert the records into the table. There can be one or more number
> of records
> + * in each of the data block based on the record size.
> + */
> +typedef struct ParallelCopyDataBlock
>
> Keep one empty line after the description line like below. I also
> suggested to do a minor tweak in the above sentence which is as
> follows:
>
> * Copy data block information.
> *
> * These data blocks are created in DSM. Data read ...
>
> Try to follow a similar format in other comments as well.
>

Modified as suggested.

> 5. I think it is better to move parallelism related code to a new file
> (we can name it as copyParallel.c or something like that).
>

Modified, added copyparallel.c file to include copy parallelism
functionality & copyparallel.c file & some of the function prototype &
data structure were moved to copy.h header file so that it can be
shared between copy.c & copyparallel.c

> 6. copy.c(1648,25): warning C4133: 'function': incompatible types -
> from 'ParallelCopyLineState *' to 'uint32 *'
> Getting above compilation warning on Windows.
>

Modified the data type.

> v5-0003-Allow-copy-from-command-to-process-data-from-file
> ==================================================
> 1.
> @@ -4294,7 +5047,7 @@ BeginCopyFrom(ParseState *pstate,
> * only in text mode.
> */
> initStringInfo(&cstate->attribute_buf);
> - cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
> + cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *)
> palloc(RAW_BUF_SIZE + 1);
>
> Is there anyway IsParallelCopy can be true by this time? AFAICS, we do
> anything about parallelism after this. If you want to save this
> allocation then we need to move this after we determine that
> parallelism can be used or not and accordingly the below code in the
> patch needs to be changed.
>
> * ParallelCopyFrom - parallel copy leader's functionality.
> *
> * Leader executes the before statement for before statement trigger, if before
> @@ -1110,8 +1547,302 @@ ParallelCopyFrom(CopyState cstate)
> ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
> ereport(DEBUG1, (errmsg("Running parallel copy leader")));
>
> + /* raw_buf is not used in parallel copy, instead data blocks are used.*/
> + pfree(cstate->raw_buf);
> + cstate->raw_buf = NULL;
>

Removed the palloc change, raw_buf will be allocated both for parallel
and non parallel copy. One other solution that I thought was to move
the memory allocation to CopyFrom, but this solution might affect fdw
where they use BeginCopyFrom, NextCopyFrom & EndCopyFrom. So I have
kept the allocation as in BeginCopyFrom & freeing for parallel copy in
ParallelCopyFrom.

> Is there anything else also the allocation of which depends on parallelism?
>

I felt this is the only allocated memory that sequential copy requires
and which is not required in parallel copy.

> 2.
> +static pg_attribute_always_inline bool
> +IsParallelCopyAllowed(CopyState cstate)
> +{
> + /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
> + if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
> + return false;
> +
> + /* Check if copy is into foreign table or temporary table. */
> + if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
> + RelationUsesLocalBuffers(cstate->rel))
> + return false;
> +
> + /* Check if trigger function is parallel safe. */
> + if (cstate->rel->trigdesc != NULL &&
> + !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
> + return false;
> +
> + /*
> + * Check if there is after statement or instead of trigger or transition
> + * table triggers.
> + */
> + if (cstate->rel->trigdesc != NULL &&
> + (cstate->rel->trigdesc->trig_insert_after_statement ||
> + cstate->rel->trigdesc->trig_insert_instead_row ||
> + cstate->rel->trigdesc->trig_insert_new_table))
> + return false;
> +
> + /* Check if the volatile expressions are parallel safe, if present any. */
> + if (!CheckExprParallelSafety(cstate))
> + return false;
> +
> + /* Check if the insertion mode is single. */
> + if (FindInsertMethod(cstate) == CIM_SINGLE)
> + return false;
> +
> + return true;
> +}
>
> In the comments, we should write why parallelism is not allowed for a
> particular case. The cases where parallel-unsafe clause is involved
> are okay but it is not clear from comments why it is not allowed in
> other cases.
>

Added comments.

> 3.
> + ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
> + ParallelCopyLineBoundary *lineInfo;
> + uint32 line_first_block = pcshared_info->cur_block_pos;
> + line_pos = UpdateBlockInLineInfo(cstate,
> + line_first_block,
> + cstate->raw_buf_index, -1,
> + LINE_LEADER_POPULATING);
> + lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
> + elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
> + line_first_block, lineInfo->start_offset, line_pos);
>
> Can we take all the code here inside function UpdateBlockInLineInfo? I
> see that it is called from one other place but I guess most of the
> surrounding code there can also be moved inside the function. Can we
> change the name of the function to UpdateSharedLineInfo or something
> like that and remove inline marking from this? I am not sure we want
> to inline such big functions. If it make difference in performance
> then we can probably consider it.
>

Changed as suggested.

> 4.
> EndLineParallelCopy()
> {
> ..
> + /* Update line size. */
> + pg_atomic_write_u32(&lineInfo->line_size, line_size);
> + pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
> + elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
> + line_pos, line_size);
> ..
> }
>
> Can we instead call UpdateSharedLineInfo (new function name for
> UpdateBlockInLineInfo) to do this and maybe see it only updates the
> required info? The idea is to centralize the code for updating
> SharedLineInfo.
>

Updated as suggested.

> 5.
> +static uint32
> +GetLinePosition(CopyState cstate)
> +{
> + ParallelCopyData *pcdata = cstate->pcdata;
> + ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
> + uint32 previous_pos = pcdata->worker_processed_pos;
> + uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
>
> It seems to me that each worker has to hop through all the processed
> chunks before getting the chunk which it can process. This will work
> but I think it is better if we have some shared counter which can tell
> us the next chunk to be processed and avoid all the unnecessary work
> of hopping to find the exact position.

I had tried to have a spin lock & try to track this position instead
of hopping through the processed chunks. But I did not get the earlier
performance results, there was slight degradation:
Use case 2: 3 indexes on integer columns
Run on earlier patches without spinlock:
(220.680, 0, 1X), (185.096, 1, 1.19X), (134.811, 2, 1.64X), (114.585,
4, 1.92X), (107.707, 8, 2.05X), (101.253, 16, 2.18X), (100.749, 20,
2.19X), (100.656, 30, 2.19X)
Run on latest v6 patches with spinlock:
(216.059, 0, 1X), (177.639, 1, 1.22X), (145.213, 2, 1.49X), (126.370,
4, 1.71X), (121.013, 8, 1.78X), (102.933, 16, 2.1X), (103.000, 20,
2.1X), (100.308, 30, 2.15X)
I have not included these changes as there was some performance
degradation. I will try to come with a different solution for this and
discuss in the coming days. This point is not yet handled.

> v5-0004-Documentation-for-parallel-copy
> -----------------------------------------
> 1. Can you add one or two examples towards the end of the page where
> we have examples for other Copy options?
>
>
> Please run pgindent on all patches as that will make the code look better.

Have run pgindent on the latest patches.

> From the testing perspective,
> 1. Test by having something force_parallel_mode = regress which means
> that all existing Copy tests in the regression will be executed via
> new worker code. You can have this as a test-only patch for now and
> make sure all existing tests passed with this.
> 2. Do we have tests for toast tables? I think if you implement the
> previous point some existing tests might cover it but I feel we should
> have at least one or two tests for the same.
> 3. Have we checked the code coverage of the newly added code with
> existing tests?

These will be handled in the next few days.

These changes are present as part of the v6 patch set.

I'm summarizing the pending open points so that I don't miss anything:
1) Performance test on latest patch set.
2) Testing points suggested.
3) Support of parallel copy for COPY_OLD_FE.
4) Worker has to hop through all the processed chunks before getting
the chunk which it can process.
5) Handling of Tomas's comments.
6) Handling of Greg's comments.

We plan to work on this & complete in the next few days.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-10-08 05:58:57 Re: Partitionwise join fails under GEQO
Previous Message vignesh C 2020-10-08 05:45:01 Re: Parallel copy