Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: michael(at)paquier(dot)xyz
Cc: sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-31 05:11:22
Message-ID: 20240131.141122.279551156957581322.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <Zbi1TwPfAvUpKqTd(at)paquier(dot)xyz>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jan 2024 17:37:35 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

>> We use defel->location for an error message. (We don't need
>> to set location for the default "text" DefElem.)
>
> Yeah, but you should not need to have this error in the paths that set
> the callback routines in opts_out if the same validation happens a few
> lines before, in copy.c.

Ah, yes. defel->location is used in later patches. For
example, it's used when a COPY handler for the specified
FORMAT isn't found.

> I am really worried about the complexities
> this thread is getting into because we are trying to shape the
> callbacks in the most generic way possible based on *two* use cases.
> This is going to be a never-ending discussion. I'd rather get some
> simple basics, and then we can discuss if tweaking the callbacks is
> really necessary or not. Even after introducing the pg_proc lookups
> to get custom callbacks.

I understand your concern. Let's introduce minimal callbacks
as the first step. I think that we completed our design
discussion for this feature. We can choose minimal callbacks
based on the discussion.

> The custom options don't seem like an absolute strong
> requirement for the first shot with the callbacks or even the
> possibility to retrieve the callbacks from a function call. I mean,
> you could provide some control with SET commands and a few GUCs, at
> least, even if that would be strange. Manipulations with a list of
> DefElems is the intuitive way to have custom options at query level,
> but we also have to guess the set of callbacks from this list of
> DefElems coming from the query. You see my point, I am not sure
> if it would be the best thing to process twice the options, especially
> when it comes to decide if a DefElem should be valid or not depending
> on the callbacks used. Or we could use a kind of "special" DefElem
> where we could store a set of key:value fed to a callback :)

Interesting. Let's remove custom options support from the
initial minimal callbacks.

>> Can I prepare the v10 patch set as "the v7 patch set" + "the
>> removal of the "if" checks in NextCopyFromRawFields()"?
>> (+ reverting opts_out->{csv_mode,binary} changes in
>> ProcessCopyOptions().)
>
> Yep, if I got it that would make sense to me. If you can do that,
> that would help quite a bit. :)

I've prepared the v10 patch set. Could you try this?

Changes since the v7 patch set:

0001:

* Remove CopyToProcessOption() callback
* Remove CopyToGetFormat() callback
* Revert passing CopyToState to ProcessCopyOptions()
* Revert moving "opts_out->{csv_mode,binary} = true" to
ProcessCopyOptionFormatTo()
* Change to receive "const char *format" instead "DefElem *defel"
by ProcessCopyOptionFormatTo()

0002:

* Remove CopyFromProcessOption() callback
* Remove CopyFromGetFormat() callback
* Change to receive "const char *format" instead "DefElem
*defel" by ProcessCopyOptionFormatFrom()
* Remove "if (cstate->opts.csv_mode)" branches from
NextCopyFromRawFields()

FYI: Here are Copy{From,To}Routine in the v10 patch set. I
think that only Copy{From,To}OneRow are minimal callbacks
for the performance gain. But can we keep Copy{From,To}Start
and Copy{From,To}End for consistency? We can remove a few
{csv_mode,binary} conditions by Copy{From,To}{Start,End}. It
doesn't depend on the number of COPY target tuples. So they
will not affect performance.

/* Routines for a COPY FROM format implementation. */
typedef struct CopyFromRoutine
{
/*
* Called when COPY FROM is started. This will initialize something and
* receive a header.
*/
void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);

/* Copy one row. It returns false if no more tuples. */
bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);

/* Called when COPY FROM is ended. This will finalize something. */
void (*CopyFromEnd) (CopyFromState cstate);
} CopyFromRoutine;

/* Routines for a COPY TO format implementation. */
typedef struct CopyToRoutine
{
/* Called when COPY TO is started. This will send a header. */
void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);

/* Copy one row for COPY TO. */
void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);

/* Called when COPY TO is ended. This will send a trailer. */
void (*CopyToEnd) (CopyToState cstate);
} CopyToRoutine;

Thanks,
--
kou

Attachment Content-Type Size
v10-0001-Extract-COPY-TO-format-implementations.patch text/x-patch 19.6 KB
v10-0002-Extract-COPY-FROM-format-implementations.patch text/x-patch 26.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-01-31 05:18:27 RE: Improve eviction algorithm in ReorderBuffer
Previous Message Peter Smith 2024-01-31 05:10:16 Re: Synchronizing slots from primary to standby