Re: Re: Toast,bytea, Text -blob all confusing

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Joe Conway <joseph(dot)conway(at)home(dot)com>
Cc: Alex Pilosov <alex(at)acecape(dot)com>, Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Toast,bytea, Text -blob all confusing
Date: 2001-09-04 03:19:19
Message-ID: 200109040319.f843JJ016358@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Thanks you your description, I have added a bytea section to the docs.

Patch attached.

> > > >For bytea, follow this rule: to escape a null character, use this:
> > > >'\\0'. To escape a backslash, use this: '\\\\'.
> > > >
> > > >Same idea to unescape data.
> > >
> > > Are there other characters that need to be escaped? I suspect there are
> > > more characters that need to be escaped - ctrl chars? single quotes?.
> Why
> > > four backslashes for one? Is there a definitive documentation anywhere
> for
> > > what bytea is _supposed_ (not what it might actually be) to be and how
> it
> > > is to be handled?
> >
> > Yes, sorry for being unclear on this one. Here's a more detailed
> > explanation: Bytea is just a stream of data. On input, it must follow C
> > escaping conventions, on output, it will be escaped using C escaping
> > conventions.
> >
> > However, there's a trap: before things get to bytea input handler, they
> > are first processed by postgresql general parser. Hence, the string \\0
> > given from psql will become \0 when it gets to bytea input handler. String
> > \\\\ will become \\. All non-printable characters must be escaped like
> > this: \\(octal of character), for ex, chr(255) must be presented as \\377.
> > (If you want, you can also use this as an alternative and more generic way
> > to escape a backslash, \\134). Single quote must be escaped either as \\47
> > or as \'. Note the single backslash. Why only one? Because bytea parser
> > doesn't care about single quotes and you only need to escape it for the
> > postgresql parser...
> >
> > So, just keep in mind the double-parsing of input and you'll be safe.
> >
> > > Also why wouldn't escaping stuff like this work with TEXT then? If a
> null
> > > is going to be backslash backslash zero, and come out the same way, it
> sure
> > > looks like TEXT to me :). OK so there's this thing about storage. So
> maybe
> > Because text is null-terminated, can't have a null inside.
> >
> > > I could save a byte by just converting nulls to backslash zero and real
> > > backslashes to backslash backslash. Tada.
> > If you do that, you'll break ordering/comparison. Bytea in memory is
> > stored EXACTLY the way input string was, without any escaping, hence, all
> > comparisons will be correct ( '\\0'::bytea is less than '\\1'::bytea).
> >
> > With your representation, comparisons will fail, because in memory, data
> > is escaped using some escaping convention that you made up.
> >
> > > OK it's probably not the same, but having to put four backslashes when
> two
> > > should be enough to quote one makes me rather puzzled and uneasy.
> > Double parsing, hence double escaping.
>
> Great explanation Alex --thanks! I'll add a bit:
>
> I've done about 400,000 inserts and subsequent queries to verify that, from
> PHP at least, only four charaters need to be escaped. The inserts were 20
> byte strings gotten by concatenating some known text with a counter in a
> loop, and then producing a SHA-1 hash. This produces very uniformly
> distributed binary data. Then I ran the same loop, except I queried for the
> inserted strings. I'm quite confident from this experiment that binary can
> reliably be inserted via standard SQL when these four characters are
> escaped. Here's the run down:
>
> \\000 First slash is consumed by the general parser, leaving \000 for the
> byteain function. If you only use one '\', the general parser converts the
> character into a true '\0' byte, and the byteain function sees this byte as
> the string terminator and stops. This causes the input string to be
> truncated (which seems to confuse many people).
>
> \\012 In my early tests 0x0a (LF) was getting converted to 0x20 (space).
> I think this was happening during PHP's parsing, but I'm still not sure.
> I'll dig into this some more later.
>
> \\047 As Alex mentioned, the byteain function doesn't treat this as a
> special character, but of course the general parser does as this is a single
> quote. It also works fine to escape it as \', I just prefer to use all
> octals.
>
> \\134 Both the general parser and the byteain function see this (a single
> \) as the special escape character. Therefore the general parser turns \\\\
> into \\, and the byteain function turns \\ into \ for actual storage. Again,
> I prefer to use the octal representation instead.
>
> I hope this helps reduce the concerns and confusion over bytea. If anyone
> can help explain why my linefeeds were getting converted to spaces, all the
> mysteries would be explained ;-)
>
> -- Joe
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

Attachment Content-Type Size
unknown_filename text/plain 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2001-09-04 03:25:29 Re: Bytea/Base64 encoders for libpq - interested?
Previous Message Tom Lane 2001-09-04 03:09:29 Re: Another pgindent request