Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: michael(at)paquier(dot)xyz
Cc: sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-02-05 09:05:15
Message-ID: 20240205.180515.746623205615816813.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <ZcCKwAeFrlOqPBuN(at)paquier(dot)xyz>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 5 Feb 2024 16:14:08 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> So, I've looked at all that today, and finished by applying two
> patches as of 2889fd23be56 and 95fb5b49024a to get some of the
> weirdness with the workhorse routines out of the way.

Thanks!

> As this is called within the OneRow routine, I can live with that. If
> there is an opposition to that, we could just attach it within the
> routines.

I don't object the approach.

> I am attaching a v12 which is close to what I want it to be, with
> much more documentation and comments. There are two things that I've
> changed compared to the previous versions though:
> 1) I have added a callback to set up the input and output functions
> rather than attach that in the Start callback.

I'm OK with this. I just don't use them in Apache Arrow COPY
FORMAT extension.

> - No need for plugins to think about how to allocate this data. v11
> and other versions were doing things the wrong way by allocating this
> stuff in the wrong memory context as we switch to the COPY context
> when we are in the Start routines.

Oh, sorry. I missed it when I moved them.

> 2) I have backpedaled on the postpare callback, which did not bring
> much in clarity IMO while being a CSV-only callback. Note that we
> have in copyfromparse.c more paths that are only for CSV but the past
> versions of the patch never cared about that. This makes the text and
> CSV implementations much closer to each other, as a result.

Ah, sorry. I forgot to eliminate cstate->opts.csv_mode in
CopyReadLineText(). The postpare callback is for
optimization. If it doesn't improve performance, we don't
need to introduce it.

We may want to try eliminating cstate->opts.csv_mode in
CopyReadLineText() for performance. But we don't need to
do this in introducing CopyFromRoutine. We can defer it.

So I don't object removing the postpare callback.

> Let me know if you have
> comments about all that.

Here are some comments for the patch:

+ /*
+ * Called when COPY FROM is started to set up the input functions
+ * associated to the relation's attributes writing to. `fmgr_info` can be

fmgr_info ->
finfo

+ * optionally filled to provide the catalog information of the input
+ * function. `typioparam` can be optinally filled to define the OID of

optinally ->
optionally

+ * the type to pass to the input function. `atttypid` is the OID of data
+ * type used by the relation's attribute.
+ */
+ void (*CopyFromInFunc) (Oid atttypid, FmgrInfo *finfo,
+ Oid *typioparam);

How about passing CopyFromState cstate too like other
callbacks for consistency?

+ /*
+ * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+ *
+ * 'econtext' is used to evaluate default expression for each column that
+ * is either not read from the file or is using the DEFAULT option of COPY

or is ->
or

(I'm not sure...)

+ * FROM. It is NULL if no default values are used.
+ *
+ * Returns false if there are no more tuples to copy.
+ */
+ bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);

+typedef struct CopyToRoutine
+{
+ /*
+ * Called when COPY TO is started to set up the output functions
+ * associated to the relation's attributes reading from. `fmgr_info` can

fmgr_info ->
finfo

+ * be optionally filled. `atttypid` is the OID of data type used by the
+ * relation's attribute.
+ */
+ void (*CopyToOutFunc) (Oid atttypid, FmgrInfo *finfo);

How about passing CopyToState cstate too like other
callbacks for consistency?

@@ -200,4 +204,10 @@ extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
extern int CopyReadAttributesCSV(CopyFromState cstate);
extern int CopyReadAttributesText(CopyFromState cstate);

+/* Callbacks for CopyFromRoutine->OneRow */

CopyFromRoutine->OneRow ->
CopyFromRoutine->CopyFromOneRow

+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls);
+
#endif /* COPYFROM_INTERNAL_H */

+/*
+ * CopyFromTextStart

CopyFromTextStart ->
CopyFromBinaryStart

+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+ /* Read and verify binary header */
+ ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromTextEnd

CopyFromTextEnd ->
CopyFromBinaryEnd

+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+ /* nothing to do */
+}

diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 91433d439b..d02a7773e3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -473,6 +473,7 @@ ConvertRowtypeExpr
CookedConstraint
CopyDest
CopyFormatOptions
+CopyFromRoutine
CopyFromState
CopyFromStateData
CopyHeaderChoice
@@ -482,6 +483,7 @@ CopyMultiInsertInfo
CopyOnErrorChoice
CopySource
CopyStmt
+CopyToRoutine
CopyToState
CopyToStateData
Cost

Wow! I didn't know that we need to update typedefs.list when
I add a "typedef struct".

Thanks,
--
kou

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-02-05 09:45:50 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Graham Leggett 2024-02-05 08:51:50 Re: Grant read-only access to exactly one database amongst many