Re: COPY formatting

From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY formatting
Date: 2004-03-19 10:50:21
Message-ID: 20040319105021.GB16735@zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 18, 2004 at 10:16:36AM -0500, Tom Lane wrote:
> Passing in a relation OID is probably a bad idea anyway, as it ties this
> API to the assumption that COPY is only for complete relations. There's
> been talk before of allowing a SELECT result to be presented via the
> COPY protocol, for instance. What might be a more usable API is
>
> COPY OUT:
> function formatter_out(text[]) returns text
> COPY IN:
> function formatter_in(text) returns text[]
>
> where the text array is either the results of or the input to the
> per-column datatype I/O routines. This makes it explicit that the
> formatter's job is solely to determine the column-level wrapping and
> unwrapping of the data. I'm assuming here that there is no good reason
> for the formatter to care about the specific datatypes involved; can you
> give a counterexample?

The idea was put maximum information about tuple to formatter, and what
will formatter do with this information is a formatter problem.

> > It's pity that main idea of current COPY is based on separated lines
> > and it is not more common interface for streaming data between FE and BE.
>
> Yeah, that was another concern I had. This API would let the formatter
> control line-level layout but it would not eliminate the hard-wired
> significance of newline. What's worse, there isn't any clean way to
> deal with reading quoted newlines --- the formatter can't really replace
> the default quoting rules if the low-level code is going to decide
> whether a newline is quoted or not.

I think latest protocol version works with blocks of data and no with
lines and client PQputCopyData() returns a block -- only docs says that
it is row of table.

> We could possibly solve that by specifying that the text output or input
> (respectively) is the complete line sent to or from the client,
> including newline or whatever other line-level formatting you are using.
> This still leaves the problem of how the low-level COPY IN code knows
> what is a complete line to pass off to the formatter_in routine. We
> could possibly fix this by adding a second input-control routine
>
> function formatter_linelength(text) returns integer
>
> which is defined to return -1 if the input isn't a complete line yet

But formatter_linelength() will need some context information I
think. The others words some struct with formatter specific internal
data. And for more difficult formats like XML you need some others
context data (parser data) too.

Maybe there can be some global exported struct (like for triggers) and
functions that is written in C can use it. It means for simple formats
like CSV you can use non-C functions and for formats like XML you can
use C functions. And if it will intereting for PL developers they can
add support for access to this structs to their languages.

> (i.e., read some more data, append to the buffer, and try again), or
> >= 0 to indicate that the first N bytes of the buffer represent a
> complete line to be passed off to formatter_in. I don't see a way to
> combine formatter_in and formatter_linelength into a single function
> without relying on "out" parameters, which would again confine the
> feature to format functions written in C.

> It's a tad annoying that we need two functions for input. One way that
> we could still keep the COPY option syntax to be just
> FORMAT csv
> is to create an arbitrary difference in the signatures of the input
> functions. Then we could have coexisting functions
> csv(text[]) returns text
> csv(text) returns text[]
> csv(text, ...) returns int
> that are referenced by "FORMAT csv".

It sounds good, but I think we both not full sure about it now, right?
CSV support will probably better add by DELIMITER extension.

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2004-03-19 10:54:35 pg_advisor schema proof of concept
Previous Message Richard Huxton 2004-03-19 10:22:04 Question on restoring and compiled plans