Re: csv format for psql

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Daniel Verite <daniel(at)manitou-mail(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: csv format for psql
Date: 2018-11-26 01:26:47
Message-ID: 32765.1543195607@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
> Basically the proposed patch addresses a simple and convenient use case
> which are neither addressed by \copy nor COPY. The fact that more options
> are available with these commands does it precludes its usefulness as is.

Yeah, I agree that this option is useful independently of whether COPY
provides something similar. I think the killer argument is that right
now, psql-ref.sgml repeatedly suggests that unaligned mode with fieldsep
',' is a reasonable way to produce comma-separated output for consumption
by other programs. That's like handing our users a loaded foot-gun.
And, in fact, right now *none* of psql's table output formats is both
unambiguous and reasonably simple/popular to use. So the astonishing
thing about this patch, IMO, is that we didn't do it a decade ago.

I went through the documentation to improve that point, and did some other
cosmetic cleanup including rebasing up to HEAD, and got the attached.

I think there are two remaining points to settle:

1. Are we limiting the separator to be a single-byte character or not?
If we are, the code could be made simpler and faster by working with a
"char" value instead of a string. If we're not, then Michael's change
needs to be undone (but I didn't do that here).

I feel that if we allow multi-byte characters here, we might as well
take the training wheels off and just say you can use any separator
string you want, as long as it doesn't contain double quote, \r, or \n.
Most programs reading a file are not going to perceive a difference
between separating fields with a single multibyte character and multiple
single-byte characters; either they can cope or they can't. A fair
number of them are going to be in the latter category. So we can either
say "do what you wish, it's your problem whether anything can read the
result" or "we're going to restrict you to something that (perhaps) is
more widely readable". I'm a bit inclined to the former viewpoint.
If we were in the business of being restrictive, why would we allow
the field separator to be changed at all? The name of the format is
*comma* separated values, not something else.

2. Speaking of the field separator, I'm pretty desperately unhappy
with the choice of "fieldsep_csv" as the parameter name. The trouble
with that is that it encourages sticking "fieldsep_csv" in between
"fieldsep" and "fieldsep_zero", because alphabet. But "fieldsep" and
"fieldsep_zero" are *intimately* tied together, in fact it's only a
dubious implementation choice that made them separate parameters at all.
It does not make any semantic sense to stick other vaguely-related
parameters in between, neither in the documentation nor in \pset output.

We could avoid this self-inflicted confusion by choosing a different
parameter name. I'd be good with "csv_fieldsep" or "csvfieldsep".

Or we could kill both issues by hard-wiring the separator as ','.

Thoughts?

regards, tom lane

Attachment Content-Type Size
psql-csv-format-v11.patch text/x-diff 28.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-11-26 01:52:30 Re: shared-memory based stats collector
Previous Message Michael Paquier 2018-11-26 00:35:56 Re: SSL tests failing with "ee key too small" error on Debian SID