Re: Split copy.c

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Split copy.c
Date: 2020-11-03 07:38:36
Message-ID: 21d9ed60-1269-c1f2-7f89-6bd722d535e2@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/11/2020 04:15, David Rowley wrote:
> On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>> On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote:
>>> On 02/11/2020 19:23, Andres Freund wrote:
>>>> On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
>>>>> There isn't much common code between COPY FROM and COPY TO, so I propose
>>>>> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
>>>>> that's much nicer.
>>>>
>>>> Not quite convinced that's the right split - or perhaps there's just
>>>> more potential. My feeling is that splitting out all the DML related
>>>> code would make the code considerably easier to read.
>>>
>>> What do you mean by DML related code?
>>
>> Basically all the insertion related code (e.g CopyMultiInsert*, lots of
>> code in CopyFrom()) and perhaps also the type input invocations.

Hmm. COPY FROM consists of two parts:

1. Parse the input text/CSV/binary format into Datums.

2. Store the Datums in the table.

They're somewhat split already. If you want to only do the parsing part,
you can use the BeginCopyFrom() and NextCopyFrom() functions to get
Datums, without storing them to a table. file_fdw uses that.

Yeah, that might indeed be another good split point. Attached is quick
prototype of that. I tried to avoid doing other changes as part of this
split, but some further refactoring might be good. Like extracting the
state for the input parsing from CopyFromStateData into a separate struct.

With these patches:

$ wc -l src/backend/commands/copy*.c
782 src/backend/commands/copy.c
1641 src/backend/commands/copyfrom.c
1646 src/backend/commands/copyfromparse.c
1363 src/backend/commands/copyto.c
5432 total

> I quite like the fact that those are static and inline-able. I very
> much imagine there'd be a performance hit if we moved them out to
> another .c file and made them extern. Some of those functions can be
> quite hot when copying into a partitioned table.

Agreed.

- Heikki

Attachment Content-Type Size
v3-0001-Split-copy.c-into-copyto.c-and-copyfrom.c.patch text/x-patch 276.0 KB
v3-0002-Split-copyfrom.c-further-into-copyfrom.c-and-copy.patch text/x-patch 104.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-11-03 07:44:23 Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)
Previous Message Jinbao Chen 2020-11-03 07:29:38 Re: Add table AM 'tid_visible'