Re: COPY as a set returning function

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY as a set returning function
Date: 2016-10-14 20:56:17
Message-ID: CADkLM=cdXdB3UQ4MNeg86WAGpZRDb8Rd4K+3vmS+tDLOTSJLEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> > That sounds fantastic. It'd help this copy variant retain festure parity
> > with normal copy. And it'd bring us closer to being able to FETCH in non
> > queries.
>
> On second thought, though, this couldn't exactly duplicate the existing
> COPY syntax, because COPY relies heavily on the rowtype of the named
> target table to tell it what it's copying. You'd need some new syntax
> to provide the list of column names and types, which puts a bit of
> a hole in the "syntax we already know" argument. A SRF-returning-record
> would have a leg up on that, because we do have existing syntax for
> defining the concrete rowtype that any particular call returns.
>
> regards, tom lane
>

I would like to make COPY itself a SRF. That's a bit beyond my
capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy AS (
COPY FROM 'example.csv' TO *RESULT SET(c1 text, c2 integer, dummy1
text, dummy2 text, c5 date)* WITH (FORMAT CSV)
*RETURNING c1, c2, c3*
)
SELECT ...
FROM my_copy
LEFT OUTER JOIN ref_table ...

The RESULT SET (colspecs) bit would be the rsinfo currently used by
copy_srf(). It would be nice if the CTE declaration could take types, but
it can't.

The RETURNING clause here doesn't return all the columns made available
from the COPY. That would be nice, but not required because the same
filtration could be done when the CTE is referenced. So if we require RETURNING
* be the only returning option I'd be fine with that.

If we're ok with adding a function like copy_srf() to the core, will we
still be happy with it when COPY does get a RETURNING clause?

Somewhat off-topic: here's some other observations of a n00b who spent a
fair amount of time looking at the copy.c code.

1. BeginCopyFrom and NextCopyFrom pull attlist/tupdesc info from ->rel,
repeatedly. If we were going to try to leverage that code we'd need to
store those things in a separate cstate member so that we add complexity
only in the initialization of the copy state data struct, pulling the
result structure from rsinfo rather than a relation. There's probably a
minor performance gain to be had in keeping that info around. Refactoring
those two procs to allow for a pre-set attlist/tupdesc would help.

2. NextCopyFrom() checks every single time to see if it's row 0 and if it
should skip this header row. I know a single (row_num == 0 && has_header)
isn't much extra processing, but shouldn't we digest and discard headers
before going into the per-row loop?

3. All the code that handles indexes, constraints, buffering, etc, simply
doesn't apply in the SRF context.

4. The code somewhat needlessly mixes code for the COPY FROM and COPY TO
cases. There's probably a good reason for this, but it made for a lot of
clutter in achieving my very narrow goal.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2016-10-14 21:00:23 Re: btree vacuum and suspended scans can deadlock
Previous Message Tom Lane 2016-10-14 20:44:14 Re: [sqlsmith] Infinite recursion in bitshift