From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Sutou Kouhei <kou(at)clear-code(dot)com> |
Cc: | michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2025-09-08 21:08:16 |
Message-ID: | CAD21AoCCjKA77xkUxx59qJ8an_G_58Mry_EtCEcFgd=g9N2xew@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 13, 2025 at 11:37 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC+gf=3w7XiA4LQnvx0g(at)mail(dot)gmail(dot)com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 28 Jul 2025 22:19:36 -0700,
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > The fields in 1 are mostly static fields, and the fields in 2 and 3
> > are likely to be accessed in hot functions during COPY FROM. Would it
> > be a good idea to restructure these fields so that we can hide the
> > fields in 1 from callback functions and having the fields in 3 in a
> > separate format-specific struct that can be accessed via an opaque
> > pointer? But could the latter change potentially cause performance
> > overheads?
>
> Yes. It changes memory layout (1 continuous memory chunk ->
> 2 separated memory chunks) and introduces indirect member
> accesses (x->y -> x->z->y).
I think fields accessed by the hot functions are limited. If we
assemble fields required by hot functions into one struct and pass it
to them can we deal with the latter? For example, we assemble the
fields in 3 I mentioned above (i.e., built-in format specific fields)
into say CopyFromStateBuiltin and pass it to CopyReadLine() function
and the following functions, instead of CopyFromState. Since there are
some places where we need to access to CopyFromState (e.g.,
CopyGetData()), CopyFromStateBuiltin needs to have a pointer to
CopyFromState as well.
> They may not have performance
> impact but we need to measure it if we want to use this
> approach.
Agreed.
> BTW, how about the following approach?
>
> copyapi.h:
>
> typedef struct CopyToStateData
> {
> /* public members */
> /* ... */
> } CopyToStateData;
>
> copyto.c:
>
> typedef struct CopyToStateInternalData
> {
> CopyToStateData parent;
>
> /* private members */
> /* ... */
> } CopyToStateInternalData;
>
> We export CopyToStateData only with public members. We don't
> export CopyToStateInternalData that has members only for
> built-in formats.
>
> CopyToStateInternalData has the same memory layout as
> CopyToStateData. So we can use CopyToStateInternalData as
> CopyToStateData.
>
> We use CopyToStateData not CopyToStateInternalData in public
> API. We cast CopyToStateData to CopyToStateInternalData when
> we need to use private members:
>
> static void
> CopySendData(CopyToState cstate, const void *databuf, int datasize)
> {
> CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
> appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize);
> }
>
> It's still direct member access.
>
>
> With this approach, we can keep the same memory layout (1
> continuous memory chunk) and direct member access. I think
> that this approach doesn't have performance impact.
>
> See the attached patch for PoC of this approach.
>
> Drawback: This approach always allocates
> CopyToStateInternalData not CopyToStateData. So we need to
> allocate needless memories for extensions. But this will
> prevent performance regression of built-in formats. Is it
> acceptable?
While this approach could make sense to avoid potential performance
overheads for built-in format, I find it's somewhat odd that
extensions cannot allocate memory for its working state having
CopyToStateData as the base type.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-09-08 21:10:08 | Re: shmem_startup_hook called twice on Windows |
Previous Message | Andres Freund | 2025-09-08 21:02:43 | Re: Should io_method=worker remain the default? |