Quick Links

Re: Inaccurate documentation about identifiers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Brennan Vincent <brennan(at)umanwizard(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Inaccurate documentation about identifiers
Date:	2022-11-17 20:01:10
Message-ID:	1954348.1668715270@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Wed, 2022-11-16 at 08:36 -0500, Brennan Vincent wrote:
>> However, it seems that all non-ASCII characters are considered
>> "letters"

> You're correct: it seems to allow any byte with the high bit set;
> including, for example, a zero-width space.

Yes, see scan.l:

ident_start [A-Za-z\200-\377_]
ident_cont [A-Za-z\200-\377_0-9\$]

identifier {ident_start}{ident_cont}*

> I don't think we want to change the documentation here, because that
> would amount to a promise that we support such identifiers forever.
> I also don't think we want to change the code, because it opens up
> several problems and I'm not sure it's worth trying to solve them.

Right. IIRC, the SQL spec would have us allow only things that actually
are letters per Unicode or other relevant spec, but (1) that's rather
encoding-dependent and (2) the hit to parsing speed would likely be
non-negligible. Still, we might do it someday if someone can find
a way around those concerns. (Accepting whitespace, in particular,
is Not Great.) I think benign neglect in the docs is the best path.

regards, tom lane

In response to

Re: Inaccurate documentation about identifiers at 2022-11-17 19:12:39 from Jeff Davis

Responses

Re: Inaccurate documentation about identifiers at 2022-11-17 22:47:32 from raf

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	raf	2022-11-17 22:47:32	Re: Inaccurate documentation about identifiers
Previous Message	Jeff Davis	2022-11-17 19:12:39	Re: Inaccurate documentation about identifiers