Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: michael(at)paquier(dot)xyz
Cc: andres(at)anarazel(dot)de, sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-02-09 07:32:05
Message-ID: 20240209.163205.704848659612151781.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <ZcWoTr1N0GELFA9E(at)paquier(dot)xyz>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 Feb 2024 13:21:34 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

>> - Revisit what we have here, looking at more profiles to see how HEAD
>> an v13 compare. It looks like we are on a good path, but let's tackle
>> things one step at a time.
>
> And attached is a v14 that's rebased on HEAD.

Thanks!

> A next step I think we could take is to split the binary-only and the
> text/csv-only data in each cstate into their own structure to make the
> structure, with an opaque pointer that custom formats could use, but a
> lot of fields are shared as well.

It'll make COPY code base cleaner but it may decrease
performance. How about just adding an opaque pointer to each
cstate as the next step and then try the split?

My suggestion:
1. Introduce Copy{To,From}Routine
(We can do it based on the v14 patch.)
2. Add an opaque pointer to Copy{To,From}Routine
(This must not have performance impact.)
3.a. Split format specific data to the opaque space
3.b. Add support for registering custom format handler by
creating a function
4. ...

> This patch is already complicated
> enough IMO, so I'm OK to leave it out for the moment, and focus on
> making this infra pluggable as a next step.

I agree with you.

> Are there any comments about this v14? Sutou-san?

Here are my comments:

+ /* Set read attribute callback */
+ if (cstate->opts.csv_mode)
+ cstate->copy_read_attributes = CopyReadAttributesCSV;
+ else
+ cstate->copy_read_attributes = CopyReadAttributesText;

I think that we should not use this approach for
performance. We need to use "static inline" and constant
argument instead something like the attached
remove-copy-read-attributes.diff.

We have similar codes for
CopyReadLine()/CopyReadLineText(). The attached
remove-copy-read-attributes-and-optimize-copy-read-line.diff
also applies the same optimization to
CopyReadLine()/CopyReadLineText().

I hope that this improved performance of COPY FROM.

+/*
+ * Routines assigned to each format.
++

Garbage "+"

+ * CSV and text share the same implementation, at the exception of the
+ * copy_read_attributes callback.
+ */

+/*
+ * CopyToTextOneRow
+ *
+ * Process one row for text/CSV format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate,
+ TupleTableSlot *slot)
+{
...
+ if (cstate->opts.csv_mode)
+ CopyAttributeOutCSV(cstate, string,
+ cstate->opts.force_quote_flags[attnum - 1]);
+ else
+ CopyAttributeOutText(cstate, string);
...

How about use "static inline" and constant argument approach
here too?

static inline void
CopyToTextBasedOneRow(CopyToState cstate,
TupleTableSlot *slot,
bool csv_mode)
{
...
if (cstate->opts.csv_mode)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
CopyAttributeOutText(cstate, string);
...
}

static void
CopyToTextOneRow(CopyToState cstate,
TupleTableSlot *slot,
bool csv_mode)
{
CopyToTextBasedOneRow(cstate, slot, false);
}

static void
CopyToCSVOneRow(CopyToState cstate,
TupleTableSlot *slot,
bool csv_mode)
{
CopyToTextBasedOneRow(cstate, slot, true);
}

static const CopyToRoutine CopyCSVRoutineText = {
...
.CopyToOneRow = CopyToCSVOneRow,
...
};

Thanks,
--
kou

Attachment Content-Type Size
remove-copy-read-attributes.diff text/x-patch 7.8 KB
remove-copy-read-attributes-and-optimize-copy-read-line.diff text/x-patch 13.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-02-09 07:42:53 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Jim Jones 2024-02-09 07:24:41 Re: Psql meta-command conninfo+