Re: Toast,bytea, Text -blob all confusing

From: Alex Pilosov <alex(at)acecape(dot)com>
To: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Toast,bytea, Text -blob all confusing
Date: 2001-08-28 13:08:01
Message-ID: Pine.BSO.4.10.10108280854010.19482-100000@spider.pilosoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 28 Aug 2001, Lincoln Yeoh wrote:

> >For bytea, follow this rule: to escape a null character, use this:
> >'\\0'. To escape a backslash, use this: '\\\\'.
> >
> >Same idea to unescape data.
>
> Are there other characters that need to be escaped? I suspect there are
> more characters that need to be escaped - ctrl chars? single quotes?. Why
> four backslashes for one? Is there a definitive documentation anywhere for
> what bytea is _supposed_ (not what it might actually be) to be and how it
> is to be handled?

Yes, sorry for being unclear on this one. Here's a more detailed
explanation: Bytea is just a stream of data. On input, it must follow C
escaping conventions, on output, it will be escaped using C escaping
conventions.

However, there's a trap: before things get to bytea input handler, they
are first processed by postgresql general parser. Hence, the string \\0
given from psql will become \0 when it gets to bytea input handler. String
\\\\ will become \\. All non-printable characters must be escaped like
this: \\(octal of character), for ex, chr(255) must be presented as \\377.
(If you want, you can also use this as an alternative and more generic way
to escape a backslash, \\134). Single quote must be escaped either as \\47
or as \'. Note the single backslash. Why only one? Because bytea parser
doesn't care about single quotes and you only need to escape it for the
postgresql parser...

So, just keep in mind the double-parsing of input and you'll be safe.

> Also why wouldn't escaping stuff like this work with TEXT then? If a null
> is going to be backslash backslash zero, and come out the same way, it sure
> looks like TEXT to me :). OK so there's this thing about storage. So maybe
Because text is null-terminated, can't have a null inside.

> I could save a byte by just converting nulls to backslash zero and real
> backslashes to backslash backslash. Tada.
If you do that, you'll break ordering/comparison. Bytea in memory is
stored EXACTLY the way input string was, without any escaping, hence, all
comparisons will be correct ( '\\0'::bytea is less than '\\1'::bytea).

With your representation, comparisons will fail, because in memory, data
is escaped using some escaping convention that you made up.

> OK it's probably not the same, but having to put four backslashes when two
> should be enough to quote one makes me rather puzzled and uneasy.
Double parsing, hence double escaping.

--
Alex Pilosov | http://www.acedsl.com/home.html
CTO - Acecape, Inc. | AceDSL:The best ADSL in the world
325 W 38 St. Suite 1005 | (Stealth Marketing Works! :)
New York, NY 10018 |

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Reinoud van Leeuwen 2001-08-28 13:09:06 Re: performance question
Previous Message Oliver Elphick 2001-08-28 12:50:57 INTERVAL type: SQL92 implementation