Re: Unicode UTF-8 table formatting for psql text output

From: Roger Leigh <rleigh(at)codelibre(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unicode UTF-8 table formatting for psql text output
Date: 2009-10-26 22:58:47
Message-ID: 20091026225847.GA11903@codelibre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > While i agree this looks nicer I wonder what it does to things like
> > excel/gnumeric/ooffice auto-recognizing table layouts and importing
> > files. I'm not sure our old format was so great for this so maybe this
> > is actually an improvement I'm asking for.
>
> Yeah. We can do what we like with the UTF8 format but I'm considerably
> more worried about the aspect of making random changes to the
> plain-ASCII output. On the other hand, we changed that just a release
> or so ago (to put in the multiline output in the first place) and
> I didn't hear complaints about it that time.

I checked (using strace)

gnumeric (via libgda and gnome-database-properties)
openoffice (oobase)

Both spreadsheets require a connection to be set up first for them to
use as a handle, so I did that and traced from there. Neither made
any use of psql; they both appear to use libpq via their respective
database abstraction libs--no forking of any children observed.

Excel is a bit tougher, I bought my first copy last week for other
reasons, but I lack both windows expertise and debugging tools to trace
things, and I also dual boot my computer with the postgres install on
the Linux partition, making connecting to the database rather hard! I
think someone else is better suited to check this one!

On a related note, there's something odd with the pager code. The output
of \l with the pager off:

rleigh=# \l
List of databases
Name │ Owner │ Encoding │ Collation │ Ctype │ Access privileges
─────────────────┼──────────┼──────────┼─────────────┼─────────────┼───────────────────────
[...]

(header line is 91 characters, 273 bytes)

And with the pager on:

rleigh=# \l
List of databases
Name │ Owner │ Encoding │ Collation │ Ctype │ Access privileges
─────────────────┼──────────┼──────────┼─────────────┼─────────────┼─────────────────
��─────
[...]

(longest header line 85 characters, 255 bytes, 256 bytes inc. LF,
remainder on second line)

Note that the pager wasn't required and so wasn't actually invoked, but
the output was corrupted. A newline was inserted almost at the end of
the line and the continuation lacks a leading \342 which (since these
UTF-8 codes are all three-byte) leads to two bytes which are invalid
UTF-8. Since this spurious newline got inserted exactly on a 256 byte
boundary, I was wondering if there was some buffer either internal to
psql or in the termios/pty layer that was getting flushed. It also
lost the first byte of the second line (possibly swapped for the \n).

Another wierdness: it only happens if the terminal width is > 85
columns wide, otherwise it just wraps around as one would expect!
AFAICT there are no 255/256 length buffers in the code, and the code
doing the printing is just doing stdio to fout which is either stdout
or a pipe! Because of this, I can't see how the spurious \n appears
in the middle of a simple loop. If border=2, you'll see this for all
top mid and bottom ruled lines.

I do see strace showing some termios fiddling, could that be at fault
or is that just readline ncurses initialisation?

Regards,
Roger

--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-10-26 23:19:24 Re: Unicode UTF-8 table formatting for psql text output
Previous Message Pavel Stehule 2009-10-26 22:53:33 Re: Anonymous Code Blocks as Lambdas?