Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-11-18 09:56:09
Message-ID: CALDaNm05FnA-ePvYV_t2+WE_tXJymbfPwnm+kc9y1iMkR+NbUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 29, 2020 at 11:45 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Oct 27, 2020 at 7:06 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> [latest version]
>
> I think the parallel-safety checks in this patch
> (v9-0002-Allow-copy-from-command-to-process-data-from-file) are
> incomplete and wrong. See below comments.
> 1.
> +static pg_attribute_always_inline bool
> +CheckExprParallelSafety(CopyState cstate)
> +{
> + if (contain_volatile_functions(cstate->whereClause))
> + {
> + if (max_parallel_hazard((Query *) cstate->whereClause) != PROPARALLEL_SAFE)
> + return false;
> + }
>
> I don't understand the above check. Why do we only need to check where
> clause for parallel-safety when it contains volatile functions? It
> should be checked otherwise as well, no? The similar comment applies
> to other checks in this function. Also, I don't think there is a need
> to make this function inline.
>

I felt we should check if where clause is parallel safe and also check
if it does not contain volatile function, this is to avoid cases where
expressions may query the table we're inserting into. Modified it
accordingly.

> 2.
> +/*
> + * IsParallelCopyAllowed
> + *
> + * Check if parallel copy can be allowed.
> + */
> +bool
> +IsParallelCopyAllowed(CopyState cstate)
> {
> ..
> + * When there are BEFORE/AFTER/INSTEAD OF row triggers on the table. We do
> + * not allow parallelism in such cases because such triggers might query
> + * the table we are inserting into and act differently if the tuples that
> + * have already been processed and prepared for insertion are not there.
> + * Now, if we allow parallelism with such triggers the behaviour would
> + * depend on if the parallel worker has already inserted or not that
> + * particular tuples.
> + */
> + if (cstate->rel->trigdesc != NULL &&
> + (cstate->rel->trigdesc->trig_insert_after_statement ||
> + cstate->rel->trigdesc->trig_insert_new_table ||
> + cstate->rel->trigdesc->trig_insert_before_row ||
> + cstate->rel->trigdesc->trig_insert_after_row ||
> + cstate->rel->trigdesc->trig_insert_instead_row))
> + return false;
> ..
>
> Why do we need to disable parallelism for before/after row triggers
> unless they have parallel-unsafe functions? I see a few lines down in
> this function you are checking parallel-safety of trigger functions,
> what is the use of the same if you are already disabling parallelism
> with the above check.
>

Currently only before statement trigger is supported, rest of the
triggers are not supported, comments for the same is mentioned atop of
the checks. Removed the parallel safe check which was not required.

> 3. What about if the index on table has expressions that are
> parallel-unsafe? What is your strategy to check parallel-safety for
> partitioned tables?
>
> I suggest checking Greg's patch for parallel-safety of Inserts [1]. I
> think you will find that most of those checks are required here as
> well and see how we can use that patch (at least what is common). I
> feel the first patch should be just to have parallel-safety checks and
> we can test that by trying to enable Copy with force_parallel_mode. We
> can build the rest of the patch atop of it or in other words, let's
> move all parallel-safety work into a separate patch.
>

I have made this as a separate patch as of now. I will work on to see
if I can use Greg's changes as it is or if require I will provide few
review comments on top of Greg's patch so that it is usable for
parallel copy too and later post a separate patch with the changes on
top of it. I will retain it as a separate patch till that time.

> Few assorted comments:
> ========================
> 1.
> +/*
> + * ESTIMATE_NODE_SIZE - Estimate the size required for node type in shared
> + * memory.
> + */
> +#define ESTIMATE_NODE_SIZE(list, listStr, strsize) \
> +{ \
> + uint32 estsize = sizeof(uint32); \
> + if ((List *)list != NIL) \
> + { \
> + listStr = nodeToString(list); \
> + estsize += strlen(listStr) + 1; \
> + } \
> + \
> + strsize = add_size(strsize, estsize); \
> +}
>
> This can be probably a function instead of a macro.
>

Changed it to a function.

> 2.
> +/*
> + * ESTIMATE_1BYTE_STR_SIZE - Estimate the size required for 1Byte strings in
> + * shared memory.
> + */
> +#define ESTIMATE_1BYTE_STR_SIZE(src, strsize) \
> +{ \
> + strsize = add_size(strsize, sizeof(uint8)); \
> + strsize = add_size(strsize, (src) ? 1 : 0); \
> +}
>
> This could be an inline function.
>

Changed it to an inline function.

> 3.
> +/*
> + * SERIALIZE_1BYTE_STR - Copy 1Byte strings to shared memory.
> + */
> +#define SERIALIZE_1BYTE_STR(dest, src, copiedsize) \
> +{ \
> + uint8 len = (src) ? 1 : 0; \
> + memcpy(dest + copiedsize, (uint8 *) &len, sizeof(uint8)); \
> + copiedsize += sizeof(uint8); \
> + if (src) \
> + dest[copiedsize++] = src[0]; \
> +}
>
> Similarly, this could be a function. I think keeping such things as
> macros in-between code makes it difficult to read. Please see if you
> can make these and similar macros as functions unless they are doing
> few memory instructions. Having functions makes it easier to debug the
> code as well.
>

Changed it to a function.

Attached v10 patch has the fixes for the same.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v10-0001-Copy-code-readjustment-to-support-parallel-copy.patch text/x-patch 30.9 KB
v10-0002-Check-if-parallel-copy-can-be-performed.patch text/x-patch 21.3 KB
v10-0003-Allow-copy-from-command-to-process-data-from-fil.patch text/x-patch 73.0 KB
v10-0004-Documentation-for-parallel-copy.patch text/x-patch 2.7 KB
v10-0005-Parallel-Copy-For-Binary-Format-Files.patch text/x-patch 23.3 KB
v10-0006-Tests-for-parallel-copy.patch text/x-patch 34.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-11-18 09:58:48 Re: Parallel copy
Previous Message Daniel Gustafsson 2020-11-18 09:43:35 Re: Move OpenSSL random under USE_OPENSSL_RANDOM