Re: [HACKERS] backslash in psql output

From: Massimo Dal Zotto <dz(at)cs(dot)unitn(dot)it>
To: hackers(at)postgreSQL(dot)org (PostgreSQL Hackers)
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane)
Subject: Re: [HACKERS] backslash in psql output
Date: 1998-10-11 12:24:14
Message-ID: 199810111224.OAA02257@nikita.wizard.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I realize the double-backslash is confusing, but I don't think we can
> > make such a user-visible change at this time. I think we need to open
> > discussion on this issue on the general list, and to include discussion
> > of NULL displays, and any other issues, as well as how to properly
> > output the column separation character if that appears in the data.
> > So, I think we have to put it back to the old way, and open discussion
> > about this after 6.4.
>
> Well, actually there *was* public discussion of this issue, on the
> pgsql-interfaces list around 12/13 August. The consensus was that
> unnecessary backslashing was a bad idea --- in fact, I didn't see
> *anyone* arguing in favor of the old behavior, and the people who
> actually had backslashes in their data definitely didn't want it.
> Admittedly it was a pretty small sample (Tom Lockhart and I were
> two of the primary complainers) but there wasn't any sentiment
> for keeping the old behavior.
>
> Keep in mind that what we are discussing here is the behavior of
> PQprint(), not the behavior of FE/BE transport protocol or anything
> else that affects data received by applications. PQprint's goal in
> life is to present data in a reasonably human-friendly way, *not*
> to produce a completely unambiguous machine-readable syntax. Its
> output format is in fact very ambiguous. Here's an example:
>
> play=> create table test(id int4, val text);
> play=> insert into test values (1, NULL);
> play=> insert into test values (2, ' ');
> play=> insert into test values (3, 'foobar');
> play=> insert into test values (4, 'oneback\\slash');
> play=> insert into test values (5, 'onevert|bar');
> play=> select * from test;
> id|val
> --+-------------
> 1|
> 2|
> 3|foobar
> 4|oneback\slash
> 5|onevert|bar
> (5 rows)
>
> You can't tell the difference between a NULL field and an all-blanks
> value in this format; nor can you really be sure how many trailing
> blanks there are in tuples 3 and 5. So the goal is readability,
> not lack of ambiguity. Given that goal, I don't see the value of
> printing backslash escapes. Are you really having difficulty telling
> the data vertical bar from the ones used as column separators?
> Physical alignment is the cue the eye relies on, I think.
>
> The only cases that PQprint inserted backslashes for were the column
> separator char (unnecessary per above example), newlines (also not
> exactly hard to recognize), and backslash itself. All of these
> seem unnecessary and confusing to me.
>
> I'm sorry that this change sat in my to-do queue for so long, but
> I don't see it as a last-minute thing. The consensus to do it was
> established two months ago.
>
> regards, tom lane
>
>
>

In my opinion we should privilege machine-readableness first and then provide
some user option to enable user-friendly conversion in psql output if one
really needs it.

In situations where data is processed by other programs it is very important
that there is no ambiguity in strings exchanged between the application and
the backend. This is already done for input, which supports C-like escape,
but not yet for output, which can produce ambiguous data when nulls, arrays
or non-printing characters are involved. This is the reason why I always use
my C-like output functions (contrib/string-io) in all my applications.

These arguments apply also to the copy command which uses the same output
functions. Consider the case where a text field contains a multi-line string
with newlines embedded; if you export the table into an external files the
field is split into many lines which are interpreted as separate records by
commonly used line-oriented filters like awk or grep.

I believe that the right way to handle all this stuff is the following:

input:

binary data escaped data
| |
(user conversion) (psql input)
| |
+-----------------------+
|
escaped query
|
(libpq)
|
escaped query escaped data
| |
(parser unescape) (copy-from unescape)
| |
+-----------------------+
|
binary data
|
(input function)
|
internal data

output:

internal data
|
(output function)
|
escaped data
|
+-----------------------+
| |
(libpq) (copy-to)
| |
escaped data escaped data
|
|
+-----------------------+-----------------------+
| | |
(user conversion) (psql output) (psql unescape)
| | |
binary data escaped data binary data

In the above schema binary data means the external representation of data
containing non-printing or delimiters characters like quotes or newlines.
In this schema all the data exchanged with the backend should be escaped
in order to guarantee unambiguity to applications. The input and output
user conversion functions could be provided by libpq as utilities, and the
conversion could possibly be done automatically by libpq itself if some
global flag is set by the application.
Psql input should accept only escaped data while the output could be escaped
(default) or binary depending on a user supplied switch.
Files read or written by the copy command should be always escaped with
exactly one record for line. Pg_dump should produce escaped strings.
All this stuff requires the use of new output functions like those provided
in contrib/string-io.

There is still the problem of distinguishing between scalars and arrays which
is necessary for user output conversion. In my output functions I solved the
problem by escaping the first '{' of each field if it is not an array.
Another problem is that array input requires a double escaping, one for the
query parser and a second one for the array parser. Also nulls (\0) are not
handled by the input code. This should be fixed if we want true binary data.

I don't know if C-escapes violate the ansi sql standard but I believe they
makes life easier for the programmer. And if we add some global flag in
libpq we could also do automatic conversion to be compatible with ansi sql
and old applications. Note that arrays aren't ansi sql anyway.

Anyway a runtime switch is preferable to a configure switch.

--
Massimo Dal Zotto

+----------------------------------------------------------------------+
| Massimo Dal Zotto email: dz(at)cs(dot)unitn(dot)it |
| Via Marconi, 141 phone: ++39-461-534251 |
| 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ |
| Italy pgp: finger dz(at)tango(dot)cs(dot)unitn(dot)it |
+----------------------------------------------------------------------+

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-10-11 17:05:03 Re: [HACKERS] Re: inet/cidr/bind
Previous Message Massimo Dal Zotto 1998-10-11 12:04:47 Re: [HACKERS] TCL_ARRAYS code in libpgtcl is pretty seriously broken