Re: Make COPY format extendable: Extract COPY TO format implementations

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-02-02 06:21:31
Message-ID: ZbyJ60Fd7CHt4m0i@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> Thanks. It'll help us.

I have done a review of v10, see v11 attached which is still WIP, with
the patches for COPY TO and COPY FROM merged together. Note that I'm
thinking to merge them into a single commit.

@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
bool convert_selectively; /* do selective binary conversion? */
CopyOnErrorChoice on_error; /* what to do when error happened */
List *convert_select; /* list of column names (can be NIL) */
+ const CopyToRoutine *to_routine; /* callback routines for COPY TO */
} CopyFormatOptions;

Adding the routines to the structure for the format options is in my
opinion incorrect. The elements of this structure are first processed
in the option deparsing path, and then we need to use the options to
guess which routines we need. A more natural location is cstate
itself, so as the pointer to the routines is isolated within copyto.c
and copyfrom_internal.h. My point is: the routines are an
implementation detail that the centralized copy.c has no need to know
about. This also led to a strange separation with
ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
the hole in-between.

The separation between cstate and the format-related fields could be
much better, though I am not sure if it is worth doing as it
introduces more duplication. For example, max_fields and raw_fields
are specific to text and csv, while binary does not care much.
Perhaps this is just useful to be for custom formats.

copyapi.h needs more documentation, like what is expected for
extension developers when using these, what are the arguments, etc. I
have added what I had in mind for now.

+typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m);

CopyReadAttributes and PostpareColumnValue are also callbacks specific
to text and csv, except that they are used within the per-row
callbacks. The same can be said about CopyAttributeOutHeaderFunction.
It seems to me that it would be less confusing to store pointers to
them in the routine structures, where the final picture involves not
having multiple layers of APIs like CopyToCSVStart,
CopyAttributeOutTextValue, etc. These *have* to be documented
properly in copyapi.h, and this is much easier now that cstate stores
the routine pointers. That would also make simpler function stacks.
Note that I have not changed that in the v11 attached.

This business with the extra callbacks required for csv and text is my
main point of contention, but I'd be OK once the model of the APIs is
more linear, with everything in Copy{From,To}State. The changes would
be rather simple, and I'd be OK to put my hands on it. Just,
Sutou-san, would you agree with my last point about these extra
callbacks?
--
Michael

Attachment Content-Type Size
v11-0001-Extract-COPY-FROM-TO-format-implementations.patch text/x-diff 43.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-02-02 06:26:36 Re: Commitfest 2024-01 is now closed
Previous Message Amit Kapila 2024-02-02 06:20:18 Re: Synchronizing slots from primary to standby