Re: csv format for psql

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Fabien COELHO" <coelho(at)cri(dot)ensmp(dot)fr>,"Michael Paquier" <michael(at)paquier(dot)xyz>,"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: csv format for psql
Date: 2018-11-26 11:40:31
Message-ID: a6fae849-f248-4bf4-ba4c-23a7b11a5105@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

> And, in fact, right now *none* of psql's table output formats is both
> unambiguous and reasonably simple/popular to use. So the astonishing
> thing about this patch, IMO, is that we didn't do it a decade ago.

Yeah, that's what motivated this submission in the first place.

> I feel that if we allow multi-byte characters here, we might as well
> take the training wheels off and just say you can use any separator
> string you want, as long as it doesn't contain double quote, \r, or \n.

The reason behind disallowing multiple characters was the
likeliness of them being mistakes. For instance, this example came
up at some point in the discussion:
\pset fieldsep_csv ,,
To me the probability that a user has fat-fingered this is pretty high,
and this would silently produce a really bogus file.

Another kind of mistake comes from the difficulty of properly
quoting on the command line:
psql -- csv -P fieldsep_csv='\t'
would be interpreted as a two-character separator despite being
obviously not the user's intention.

About disallowing characters beyond US-ASCII, I can't find a similar
justification. COPY does not allow them, but it's justified (in the
archives) by the fear of being slower when importing, which is not a
concern here.

> We could avoid this self-inflicted confusion by choosing a different
> parameter name. I'd be good with "csv_fieldsep" or "csvfieldsep".

+1

> Or we could kill both issues by hard-wiring the separator as ','.

Ideally people would understand that they can use -A for any delimiter
but no quoting, or --csv with strict quoting and in that case a fixed
delimiter is fine, since it's going to be safely quoted, its presence in
the data is not a problem. But I'm not too confident that everyone
would understand it that way, even if it's well explained in the doc.

When one is told "please produce CSV files with semi-colons as
separators", it's simpler to just produce that rather than arguing
that these requirements are probably ill-advised.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-11-26 11:54:04 Re: WIP: Avoid creation of the free space map for small tables
Previous Message REIX, Tony 2018-11-26 11:05:13 RE: Shared Memory: How to use SYSV rather than MMAP ?