Multibyte still broken

From: Michael Robinson <robinson(at)netrinsics(dot)com>
To: pgsql-hackers(at)hub(dot)org
Subject: Multibyte still broken
Date: 2000-05-10 14:08:19
Message-ID: 200005101408.WAA07324@netrinsics.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

These are exerpts from a message from Tatsuo Ishii dated January 26, on
the subject of fragile code in the multibyte routines:

---- begin ----
Defensive programming saves the system but does not user. Once
corrupted data is stored in the system, it's totally useless for the
user anyway. What about validating data *before* inserting it into a
table?
---- end ----

---- begin ----
> >Here it is. With this patch, copy out should be happy even with the
> >wrong data. I'm not sure if it could be displayed correctly, though.
>
> Thank you very much. However, I think even this is too optimistic:
>
> >! if (*s & 0x80)
>
> Shouldn't it be something like:
>
> if ((*s & 0x80) && (*(s+1) & 0x80))
>
> Even though "\242\242\242\0" is an invalid EUC sequence, it still shouldn't be
> allowed to break the software.

Thanks for the suggestion. More robust code is always good.
---- end ----

More robust code may always be good, but "good" apparently doesn't always go
into the tree. Imagine my surprise, while upgrading a production server
from 6.5.3 to 7.0, when the data dumped from the old database failed to load
into the new database (well, crashed the backend, to be specific).

Apparently the "validate your own damn data" sentiment of the first excerpt
above has prevailed, because, on inspection, the MB code is just as fragile
as it was five months ago.

I was forced to perform emergency repairs to my database dump file to fool a
non-multibyte 7.0 into accepting it. Since EUC_CN is compatible with
Latin-1, and since the benefits of multibyte are small compared to the
risks, I intend to stick with unibyte Postgres henceforth.

I would, though, recommend a warning in the "INSTALL" file along the lines of:

"WARNING: Use of improperly-encoded text with multi-byte support enabled
WILL lead to data corruption and/or loss. Do not enable multi-byte support
unless you intend to fully validate your own damn data."

-Michael Robinson

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karl DeBisschop 2000-05-10 14:14:14 Re: 7.0 RPM?
Previous Message Thomas Lockhart 2000-05-10 14:06:15 FTP site