From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Tightening isspace() tests where behavior should match SQL parser |
Date: | 2017-05-20 17:48:00 |
Message-ID: | 10129.1495302480@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
The proximate cause of bug #14662,
https://www.postgresql.org/message-id/flat/20170519162316.29945.5021%40wrigleys.postgresql.org
appears to be that SplitIdentifierString's isspace() checks are
identifying some bytes of a multibyte character as spaces. It's not
quite clear to me whether there's an encoding configuration problem
involved in this specific report, but in any case feeding individual
bytes of a multibyte character to <ctype.h> functions is highly dubious.
The best you can say about that is that the behavior will be
platform-dependent.
I think that the easiest way to resolve this is to replace the isspace()
calls with scanner_isspace(), on the grounds that we are trying to parse
identifiers the same way that the core SQL lexer would, and therefore we
should use its definition of whitespace.
I went looking through all our other calls of isspace() to see if we
have the same problem anywhere else, and identified these functions
as being at risk:
parse_ident()
regproc.c's parseNameAndArgTypes (for regprocedurein and siblings)
SplitIdentifierString()
SplitDirectoriesString()
In all four cases, we must allow for non-ASCII input and it's
arguable that correct behavior is to match the core lexer,
so I propose replacing isspace() with scanner_isspace() in
these places.
There are quite a lot more isspace() calls than that of course,
but the rest of them seem probably all right to me. As an
example, I don't see a need to make float8in() stricter about
what it will allow as trailing whitespace. We're not expecting
any non-ASCII input really, and if we do see multibyte characters
and all the bytes manage to pass isspace(), not much harm is done.
Attached is a proposed patch. I'm vacillating on whether to
back-patch this --- it will fix a reported bug, but it seems
at least somewhat conceivable that it will also break cases
that were working acceptably before. Thoughts?
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
tighten-whitespace-checks.patch | text/x-diff | 4.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2017-05-20 17:58:57 | Re: proposal psql \gdesc |
Previous Message | Andres Freund | 2017-05-20 17:46:28 | Re: Making replication commands case-insensitive |