Re: exposing COPY API

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: exposing COPY API
Date: 2011-02-04 13:59:33
Message-ID: 4D4C0645.2010109@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/04/2011 05:49 AM, Itagaki Takahiro wrote:
> Here is a demonstration to support jagged input files. It's a patch
> on the latest patch. The new added API is:
>
> bool NextLineCopyFrom(
> [IN] CopyState cstate,
> [OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)
>
> It just returns separated fields in the next line. Fortunately, I need
> no extra code for it because it is just extracted from NextCopyFrom().

Thanks, I'll have a look at it, after an emergency job I need to attend
to. But the API looks weird. Why are fields and nfields OUT params. The
issue isn't decomposing the line into raw fields. The code for doing
that works fine as is, including on jagged files. See commit
af1a614ec6d074fdea46de2e1c462f23fc7ddc6f which was done for exactly this
purpose. The issue is taking those and composing them into the expected
tuple.

> I'm willing to include the change into copy APIs,
> but we still have a few issues. See below.
>
> On Fri, Feb 4, 2011 at 16:53, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> The problem with COPY FROM is that nobody's come up with a good syntax for
>> allowing it as a FROM target. Doing what I want via FDW neatly gets us
>> around that problem. But I'm quite OK with doing the hard work inside the
>> COPY code - that's what my working prototype does in fact.
> I think it is not only syntax issue. I found an issue that we hard to
> support FORCE_NOT_NULL option for extra fields. See FIXME in the patch.
> It is a fundamental problem to support jagged fields.

It's not a problem at all if you turn the line into a text array. That's
exactly why we've been proposing it for this. The array has however many
elements are on the line.

>> One thing I'd like is to to have file_fdw do something we can't do another
>> way. currently it doesn't, so it's nice but uninteresting.
> BTW, how do you determine which field is shifted in your broken CSV file?
> For example, the case you find "AB,CD,EF" for 2 columns tables.
> I could provide a raw CSV reader for jagged files, but you still have to
> cook the returned fields into a proper tuple...
>

See above. My client who deals with this situation and has been doing so
for years treats underflowing fields as null and ignores overflowing
fields. They would do he same if the data were delivered with a text
array. It works very well for them.

See <https://github.com/adunstan/postgresql-dev/tree/sqlmed2> for my dev
branch on this.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-02-04 14:29:14 SSI performance
Previous Message Kevin Grittner 2011-02-04 13:30:13 Re: SSI patch version 14