Re: Make COPY format extendable: Extract COPY TO format implementations

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-02-05 07:14:08
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 02, 2024 at 05:46:18PM +0900, Sutou Kouhei wrote:
> Hi,
> In <ZbyiDHIrxRgzYT99(at)paquier(dot)xyz>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 Feb 2024 17:04:28 +0900,
> Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > One idea I was considering is whether we should use a special value in
> > the "format" DefElem, say "custom:$my_custom_format" where it would be
> > possible to bypass the formay check when processing options and find
> > the routines after processing all the options. I'm not wedded to
> > that, but attaching the routines to the state data is IMO the correct
> > thing, because this has nothing to do with CopyFormatOptions.
> Thanks for sharing your idea.
> Let's discuss how to support custom options after we
> complete the current performance changes.
> >> I'm OK with the approach. But how about adding the extra
> >> callbacks to Copy{From,To}StateData not
> >> Copy{From,To}Routines like CopyToStateData::data_dest_cb and
> >> CopyFromStateData::data_source_cb? They are only needed for
> >> "text" and "csv". So we don't need to add them to
> >> Copy{From,To}Routines to keep required callback minimum.
> >
> > And set them in cstate while we are in the Start routine, right?
> I imagined that it's done around the following part:
> @@ -1418,6 +1579,9 @@ BeginCopyFrom(ParseState *pstate,
> /* Extract options from the statement node tree */
> ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
> + /* Set format routine */
> + cstate->routine = CopyFromGetRoutine(cstate->opts);
> +
> /* Process the target relation */
> cstate->rel = rel;
> Example1:
> /* Set format routine */
> cstate->routine = CopyFromGetRoutine(cstate->opts);
> if (!cstate->opts.binary)
> if (cstate->opts.csv_mode)
> cstate->copy_read_attributes = CopyReadAttributesCSV;
> else
> cstate->copy_read_attributes = CopyReadAttributesText;
> Example2:
> static void
> CopyFromSetRoutine(CopyFromState cstate)
> {
> if (cstate->opts.csv_mode)
> {
> cstate->routine = &CopyFromRoutineCSV;
> cstate->copy_read_attributes = CopyReadAttributesCSV;
> }
> else if (cstate.binary)
> cstate->routine = &CopyFromRoutineBinary;
> else
> {
> cstate->routine = &CopyFromRoutineText;
> cstate->copy_read_attributes = CopyReadAttributesText;
> }
> }
> BeginCopyFrom()
> {
> /* Set format routine */
> CopyFromSetRoutine(cstate);
> }
> But I don't object your original approach. If we have the
> extra callbacks in Copy{From,To}Routines, I just don't use
> them for my custom format extension.
> >> What is the better next action for us? Do you want to
> >> complete the WIP v11 patch set by yourself (and commit it)?
> >> Or should I take over it?
> >
> > I was planning to work on that, but wanted to be sure how you felt
> > about the problem with text and csv first.
> OK.
> My opinion is the above. I have an idea how to implement it
> but it's not a strong idea. You can choose whichever you like.

So, I've looked at all that today, and finished by applying two
patches as of 2889fd23be56 and 95fb5b49024a to get some of the
weirdness with the workhorse routines out of the way. Both have added
callbacks assigned in their respective cstate data for text and csv.
As this is called within the OneRow routine, I can live with that. If
there is an opposition to that, we could just attach it within the
routines. The CopyAttributeOut routines had a strange argument
layout, actually, the flag for the quotes is required as a header uses
no quotes, but there was little point in the "single arg" case, so
I've removed it.

I am attaching a v12 which is close to what I want it to be, with
much more documentation and comments. There are two things that I've
changed compared to the previous versions though:
1) I have added a callback to set up the input and output functions
rather than attach that in the Start callback. These routines are now
called once per argument, where we know that the argument is valid.
The callbacks are in charge of filling the FmgrInfos. There are some
good reasons behind that:
- No need for plugins to think about how to allocate this data. v11
and other versions were doing things the wrong way by allocating this
stuff in the wrong memory context as we switch to the COPY context
when we are in the Start routines.
- This avoids attisdropped problems, and we have a long history of
bugs regarding that. I'm ready to bet that custom formats would get
that wrong.
2) I have backpedaled on the postpare callback, which did not bring
much in clarity IMO while being a CSV-only callback. Note that we
have in copyfromparse.c more paths that are only for CSV but the past
versions of the patch never cared about that. This makes the text and
CSV implementations much closer to each other, as a result.

I had mixed feelings about CopySendEndOfRow() being split to
CopyToTextSendEndOfRow() to send the line terminations when sending a
CSV/text row, but I'm OK with that at the end. v12 is mostly about
moving code around at this point, making it kind of straight-forward
to follow as the code blocks are the same. I'm still planning to do a
few more measurements, just lacked of time. Let me know if you have
comments about all that.

Attachment Content-Type Size
v12-0001-Extract-COPY-FROM-TO-format-implementations.patch text/x-diff 36.1 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-02-05 07:21:23 Re: cataloguing NOT NULL constraints
Previous Message Richard Guo 2024-02-05 07:00:03 cfbot does not list patches in 'Current commitfest'