Re: Make COPY format extendable: Extract COPY TO format implementations

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: sawada(dot)mshk(at)gmail(dot)com, michael(at)paquier(dot)xyz, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-12 06:40:41
Message-ID: CAEG8a3J02NzGBxG1rP9C4u7qRLOqUjSOdy3q5_5v__fydS3XcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Jan 10, 2024 at 2:20 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:58:05 +0800,
> Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
>
> >> 1. Add an opaque space for custom COPY TO handler
> >> * Add CopyToState{Get,Set}Opaque()
> >> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
> >>
> >> 2. Export CopyToState::attnumlist
> >> * Add CopyToStateGetAttNumList()
> >> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
> >>
> >> 3. Export CopySend*()
> >> * Rename CopySend*() to CopyToStateSend*() and export them
> >> * Exception: CopySendEndOfRow() to CopyToStateFlush() because
> >> it just flushes the internal buffer now.
> >> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
> >>
> > I guess the purpose of these helpers is to avoid expose CopyToState to
> > copy.h,
>
> Yes.
>
> > but I
> > think expose CopyToState to user might make life easier, users might want to use
> > the memory contexts of the structure (though I agree not all the
> > fields are necessary
> > for extension handers).
>
> OK. I don't object it as I said in another e-mail:
> https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72
>
> >> 2. Need an opaque space like IndexScanDesc::opaque does
> >>
> >> * A custom COPY TO handler needs to keep its data
> >
> > I once thought users might want to parse their own options, maybe this
> > is a use case for this opaque space.
>
> Good catch! I forgot to suggest a callback for custom format
> options. How about the following API?
>
> ----
> ...
> typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
>
> ...
> typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);
>
> typedef struct CopyToFormatRoutine
> {
> ...
> CopyToProcessOption_function process_option_fn;
> } CopyToFormatRoutine;
>
> typedef struct CopyFromFormatRoutine
> {
> ...
> CopyFromProcessOption_function process_option_fn;
> } CopyFromFormatRoutine;
> ----
>
> ----
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
> index e7597894bf..1aa8b62551 100644
> --- a/src/backend/commands/copy.c
> +++ b/src/backend/commands/copy.c
> @@ -416,6 +416,7 @@ void
> ProcessCopyOptions(ParseState *pstate,
> CopyFormatOptions *opts_out,
> bool is_from,
> + void *cstate, /* CopyToState* for !is_from, CopyFromState* for is_from */
> List *options)
> {
> bool format_specified = false;
> @@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
> parser_errposition(pstate, defel->location)));
> }
> else
> - ereport(ERROR,
> - (errcode(ERRCODE_SYNTAX_ERROR),
> - errmsg("option \"%s\" not recognized",
> - defel->defname),
> - parser_errposition(pstate, defel->location)));
> + {
> + bool processed;
> + if (is_from)
> + processed = opts_out->from_ops->process_option_fn(cstate, defel);
> + else
> + processed = opts_out->to_ops->process_option_fn(cstate, defel);
> + if (!processed)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("option \"%s\" not recognized",
> + defel->defname),
> + parser_errposition(pstate, defel->location)));
> + }
> }
>
> /*
> ----

Looks good.

>
> > For the name, I thought private_data might be a better candidate than
> > opaque, but I do not insist.
>
> I don't have a strong opinion for this. Here are the number
> of headers that use "private_data" and "opaque":
>
> $ grep -r private_data --files-with-matches src/include | wc -l
> 6
> $ grep -r opaque --files-with-matches src/include | wc -l
> 38
>
> It seems that we use "opaque" than "private_data" in general.
>
> but it seems that we use
> "opaque" than "private_data" in our code.
>
> > Do you use the arrow library to control the memory?
>
> Yes.
>
> > Is there a way that
> > we can let the arrow use postgres' memory context?
>
> Yes. Apache Arrow C++ provides a memory pool feature and we
> can implement PostgreSQL's memory context based memory pool
> for this. (But this is a custom COPY TO/FROM handler's
> implementation details.)
>
> > I'm not sure this
> > is necessary, just raise the question for discussion.
>
> Could you clarify what should we discuss? We should require
> that COPY TO/FROM handlers should use PostgreSQL's memory
> context for all internal memory allocations?

Yes, handlers should use PostgreSQL's memory context, and I think
creating other memory context under CopyToStateData.copycontext
should be suggested for handler creators, so I proposed exporting
CopyToStateData to public header.
>
> > +PG_FUNCTION_INFO_V1(copy_testfmt_handler);
> > +Datum
> > +copy_testfmt_handler(PG_FUNCTION_ARGS)
> > +{
> > + bool is_from = PG_GETARG_BOOL(0);
> > + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
> > +
> >
> > extra semicolon.
>
> I noticed it too :-)
> But I ignored it because the current implementation is only
> for discussion. We know that it may be dirty.
>
>
> Thanks,
> --
> kou

--
Regards
Junwang Zhao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-01-12 06:54:09 Re: Synchronizing slots from primary to standby
Previous Message Bertrand Drouvot 2024-01-12 06:37:27 Re: Synchronizing slots from primary to standby