Quick Links

Re: Ragged CSV import

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Ragged CSV import
Date:	2009-09-09 20:34:29
Message-ID:	20090909203428.GW4132@alvh.no-ip.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >> I have received a requirement for the ability to import ragged CSV
> >> files, i.e. files that contain variable numbers of columns per row.
>
> BTW, one other thought about this: I think the historical reason for
> COPY being strict about the number of incoming columns was that it
> provided a useful cross-check that the parsing hadn't gone off into
> the weeds. We have certainly seen enough examples where the reported
> manifestation of, say, an escaping mistake was that COPY saw the row
> as having too many or too few columns. So being permissive about it
> would lose some error detection capability. I am not clear about
> whether CSV format is sufficiently more robust than the traditional
> COPY format to render this an acceptable loss. Comments?

I think accepting less columns and filling with nulls should be
protected enough for this not to be a problem; if the parser goes nuts,
it will die eventually. Silently dropping excessive trailing columns
does not seem acceptable though; you could lose entire rows and not
notice.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

In response to

Re: Ragged CSV import at 2009-09-09 20:27:05 from Tom Lane

Responses

Re: Ragged CSV import at 2009-09-09 20:56:01 from Hannu Krosing

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2009-09-09 20:37:32	Re: Ragged CSV import
Previous Message	Tom Lane	2009-09-09 20:31:40	Re: RfD: more powerful "any" types