Re: compiler warnings on the buildfarm

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: compiler warnings on the buildfarm
Date: 2007-07-13 02:56:20
Message-ID: 8374.1184295380@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> animal: lionfish warnings: 16
> scan.l:180: warning, the character range [<80>-<FF>] is ambiguous in a
> case-insensitive scanner
> scan.l:180: warning, the character range [<80>-<FF>] is ambiguous in a
> case-insensitive scanner
> scan.l:302: warning, the character range [<80>-<FF>] is ambiguous in a
> case-insensitive scanner

This is evidently complaining about plpgsql's scan.l, which specifies
%option case-insensitive
and then defines
ident_start [A-Za-z\200-\377_]
which is the way we do it in the main grammar too. But I've never
seen this message in any of the flex versions I've used with PG.
(Which flex version is installed on lionfish anyway?)

I find some relevant points in the flex manual:
http://flex.sourceforge.net/manual/Patterns.html

Character classes are expanded immediately when seen in the flex
input. This means the character classes are sensitive to the locale in
which flex is executed, and the resulting scanner will not be sensitive
to the runtime locale. This may or may not be desirable.

Character classes with ranges, such as `[a-Z]', should be used with
caution in a case-insensitive scanner if the range spans upper or
lowercase characters. Flex does not know if you want to fold all upper
and lowercase characters together, or if you want the literal numeric
range specified (with no case folding). When in doubt, flex will assume
that you meant the literal numeric range, and will issue a warning. The
exception to this rule is a character range such as `[a-z]' or `[S-W]'
where it is obvious that you want case-folding to occur.

What I suspect is happening is that lionfish is running the buildfarm
script in a non-C locale, in which flex finds that some high-bit-set
characters are case-folded by tolower() and accordingly issues this
complaint. Now the statements that "it assumes you meant the literal
numeric range" and that the behavior is fully determined at compile time
(ie, no run-time invocations of tolower(), as indeed are not to be seen
in pl_scan.c) seem to mean that we'll get the behavior we want anyway.
But the warning is a bit nervous-making.

I wonder if it'd be a good idea to invoke flex with a command like
LANG=C flex ...
to try to improve the odds that it sees C locale when it's figuring
out what "case insensitive" means.

Anyone want to look into it more closely?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sibte Abbas 2007-07-13 04:23:26 Re: schema_to_xmlschema() seems a bit less than finished
Previous Message Jeremy Drake 2007-07-13 02:45:39 Re: compiler warnings on the buildfarm