behavior difference in character literal vs national character literal

From: Brad DeJong <bpd0018(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: behavior difference in character literal vs national character literal
Date: 2017-07-28 16:38:38
Message-ID: CAJnrtnwyYxBcBhe0W04Nq3Godx0Ch3PyC-OVPREkGf_0jGztJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

postgresql 9.6.1 windows 64-bit, pgadmin4 1.6

I am working with a colleague (who mainly works with a different dbms) who
brought me the following question - why do these produce different results
...

> select 'a' || ' ' || 'b'
a b
> select 'a' || N' ' || 'b'
ab

(The answer in his specific case is that the database in question uses utf8
so just use character literals.)

Given that PostgreSQL is claiming compliance for SQL F421 National
Character, I then decided to dig in a bit more in order to answer the
question in general.

I ran "explain (analyze, verbose) select concat(concat('a', ' '), 'b')"
which shows ' ' as ' '.

Then I ran "explain (analyze, verbose) select concat(concat('a', N' '),
'b')" which shows N' ' as ''::bpchar.

I do not claim to be a lex/yacc/... export but I think that this is
happening because {xnstart} in src/backend/parser/scan.l emits an "NCHAR"
and then the literal. Then src/backend/parser/gram.y sees the explicit tag
followed by literal as "ConstTypename Sconst" and calls
makeStringConstCast(). Because ConstTypename does not have a "VARYING"
suffix, the cast is to bpchar which causes the trailing space to be
truncated when the || (concat) operator casts bpchar to text (via the
anytextcat(any, text) or textanycat(text, any) functions).

To provide evidence for/against that theory, I then ran

> select 'a' || char ' ' || 'b'
ab

and verified that this also trims the trailing space.

Given that PostgreSQL is silently converting NCHAR, NCHAR VARYING, ... to
CHAR, CHAR VARYING, ... it seems to me like national character literals
should also silently convert to character literals - with exactly the same
behavior in regards to trimming of trailing spaces during casts to ::text.

I looked through several old mailing list discussions on NCHAR support in
PostgreSQL and did not see anyone advocating that character literals should
be handled as varchar while national character literals should be handled
as bpchar.

It seems like the lexer could emit "NCHAR VARYING" followed by the literal
or it could just emit the literal. Either of those should result in the
national character literal being handled the same as a regular character
literal.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2017-07-28 19:02:01 Re: signal 11 segfaults with parallel workers
Previous Message Etsuro Fujita 2017-07-28 11:55:30 Re: [HACKERS] BUG #14759: insert into foreign data partitions fail