BUG #16236: Invalid escape encoding

From: Stéphane Campinas <stephane(dot)campinas(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: BUG #16236: Invalid escape encoding
Date: 2020-01-30 10:26:17
Message-ID: CAAyNevaL3vLCHVai1vbJQnKp1KY1pMdDchsgB3pFnSPyRoccgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks Tom for the reply!

I read once more the doc and now I better understand the "high-bit-set
value" part ;o)

myDatabaseName=# select encode('\x00017F80', 'escape');
encode
------------------
\000\x01\x7F\200

If I understand correctly, with the input "\x00017F80", I get the
outputted value above because:
- "00" is converted to "\000"
- "01" and "7F" get converted to "\x01" and "\x7F" respectively as they
are not 0 or a high-bit-set value
- "80" is converted to "\200" since it is a high-bit-set value

I remember getting confused by the fact I got hexadecimal values in
output and I didn't really get the "high-bit-set" part of the doc.

Do you know why there is this distinction between high-bit-set values
and other non-printable characters ?

Also, I still have 2 more questions.

First, the following is strange: I cannot decode what the encode method
returned

myDatabaseName=# select encode('\x00017F80', 'escape');
encode
------------------
\000\x01\x7F\200
(1 row)

myDatabaseName=# select decode('\000\x01\x7F\200', 'escape');
ERROR: invalid input syntax for type bytea

Second, as I was poking around the code, I found out about the
"bytea_output". If I set it to "escape", I still get hexadecimals. Is
that expected ?

myDatabaseName=# set bytea_output to escape;
SET
myDatabaseName=# select encode('\x00017F80', 'escape');
encode
------------------
\000\x01\x7F\200
(1 row)

Cheers,

On Mon, Jan 27, 2020 at 06:05:45PM -0500, Tom Lane wrote:
> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > From the documentation [0] about the encode function, the "escape"
format
> > should "convert zero bytes and high-bit-set bytes to octal sequences
(\nnn)
> > and doubles backslashes."
> > However, executing "select encode(E'aaa\bccc', 'escape');" outputs
> > "aaa\x08ccc", although according to the documentation I should get
> > "aaa\010ccc".
>
> No, I don't think so. The \b gives rise to a byte with hex value 08
> (that is, control-H or backspace) in the E'' literal, which converts
> to the same byte value in the bytea value that gets passed to
> encode(). Since that's not either a zero or a high-bit-set value,
> encode() just repeats it literally in the text result, and you end
> up with the same thing as if you'd just done
>
> =# select E'aaa\bccc'::text;
> text
> ------------
> aaa\x08ccc
> (1 row)
>
> I think it must be psql itself that's choosing to represent the
> backspace as \x08, because nothing in the backend does that.
> (pokes around ... yeah, it's pg_wcsformat() that's doing it)
>
> You could certainly make an argument that encode() ought to
> backslashify all ASCII control characters, not only \0. But
> it's behaving as documented, AFAICS.
>
> regards, tom lane

--
Campinas Stéphane

--
Campinas Stéphane

Attachment Content-Type Size
signature.asc text/plain 849 bytes

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-01-30 11:49:45 BUG #16238: Function " to_char(timestamp, text) " doesn't work properly
Previous Message Maurizio Sambati 2020-01-30 10:13:23 Re: ERROR: subtransaction logged without previous top-level txn record