Re: speedup COPY TO for partitioned table.

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: speedup COPY TO for partitioned table.
Date: 2025-06-26 01:43:52
Message-ID: 448668ed952443210cdff9867559fc17@oss.nttdata.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-06-05 09:45, jian he wrote:
> hi.
>
> In the V10 patch, there will be some regression if the partition column
> ordering is different from the root partitioned table.
>
> because in V10 CopyThisRelTo
> + while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
> + {
> + if (map != NULL)
> + {
> + original_slot = slot;
> + root_slot = MakeSingleTupleTableSlot(rootdesc,
> &TTSOpsBufferHeapTuple);
> + slot = execute_attr_map_slot(map, slot, root_slot);
> + }
> + else
> + slot_getallattrs(slot);
> +
> + if (original_slot != NULL)
> + ExecDropSingleTupleTableSlot(original_slot);
> +}
> as you can see, for each slot in the partition, i called
> MakeSingleTupleTableSlot to get the dumpy root_slot
> and ExecDropSingleTupleTableSlot too.
> that will cause overhead.
>
> we can call produce root_slot before the main while loop.
> like the following:
> + if (root_rel != NULL)
> + {
> + rootdesc = RelationGetDescr(root_rel);
> + root_slot = table_slot_create(root_rel, NULL);
> + map = build_attrmap_by_name_if_req(rootdesc, tupdesc, false);
> + }
> ....
> + while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
> + {
> + TupleTableSlot *copyslot;
> + if (map != NULL)
> + copyslot = execute_attr_map_slot(map, slot, root_slot);
> + else
> + {
> + slot_getallattrs(slot);
> + copyslot = slot;
> + }
> +
> please check CopyThisRelTo in v11. so, with v11, there is no
> regression for case when
> partition column ordering differs from partitioned.

Thanks for working on this improvement.

Here are some minor comments on v11 patch:

> + For example, if <replaceable class="parameter">table</replaceable>
> is not partitioned table,
> <literal>COPY <replaceable class="parameter">table</replaceable>
> TO</literal> copies the same rows as
> <literal>SELECT * FROM ONLY <replaceable
> class="parameter">table</replaceable></literal>

This describes the behavior when the table is not partitioned, but would
it also be helpful to mention the behavior when the table is a
partitioned table?
For example:

If table is a partitioned table, then COPY table TO copies the same
rows as SELECT * FROM table.

> + * if COPY TO source table is a partitioned table, then open
> each

if -> If

> + scan_rel = table_open(scan_oid,
> AccessShareLock);
>
> - /* Format and send the data */
> - CopyOneRowTo(cstate, slot);
> + CopyThisRelTo(cstate, scan_rel, cstate->rel,
> &processed);
>
> - /*
> - * Increment the number of processed tuples,
> and report the
> - * progress.
> - */
> -
> pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED,
> -
> ++processed);
> + table_close(scan_rel, AccessShareLock)

After applying the patch, blank lines exist between these statements as
below. Do we really need these blank lines?

```
scan_rel = table_open(scan_oid,
AccessShareLock);

CopyThisRelTo(cstate, scan_rel, cstate->rel,
&processed);

table_close(scan_rel, AccessShareLock);
``

> +/*
> + * rel: the relation from which the actual data will be copied.
> + * root_rel: if not NULL, it indicates that we are copying partitioned
> relation
> + * data to the destination, and "rel" is the partition of "root_rel".
> + * processed: number of tuples processed.
> +*/
> +static void
> +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,

This comment only describes the parameters. Wouldn't it better to add a
brief summary of what this function does overall?

+ * A partition's rowtype might differ from the root table's.
Since we are
+ * export partitioned table data here, we must convert it back
to the root

are export -> are exporting?

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-06-26 02:01:35 Re: Eager aggregation, take 3
Previous Message Michael Paquier 2025-06-26 00:52:20 Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages