Dubious usage of TYPCATEGORY_STRING

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Dubious usage of TYPCATEGORY_STRING
Date: 2021-12-02 21:22:21
Message-ID: 2216388.1638480141@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The parser's type-coercion heuristics include some special rules
for types belonging to the STRING category, which are predicated
on the assumption that such types are reasonably general-purpose
string types. This assumption has been violated by a few types,
one error being ancient and the rest not so much:

1. The "char" type is labeled as STRING category, even though it's
(a) deprecated for general use and (b) unable to store more than
one byte, making "string" quite a misnomer.

2. Various types we invented to store special catalog data, such as
pg_node_tree and pg_ndistinct, are also labeled as STRING category.
This seems like a fairly bad idea too.

An example of the reasons not to treat these types as being
general-purpose strings can be seen at [1], where the "char"
type has acquired some never-intended cast behaviors. Taking
that to an extreme, we currently accept

regression=# select '(1,2)'::point::"char";
char
------
(
(1 row)

My first thought about fixing point 1 was to put "char" into some
other typcategory, but that turns out to break some of psql's
catalog queries, with results like:

regression=# \dp
ERROR: operator is not unique: unknown || "char"
LINE 16: E' (' || polcmd || E'):'
^
HINT: Could not choose a best candidate operator. You might need to add explicit type casts.

I looked briefly at rejiggering the casting rules so that that
would still work, but it looks like a mess. The problem is that
unknown || "char" can match either text || text or text || anynonarray,
and it's only the special preference for preferred types *of the
same typcategory as the input type* that allows us to prefer one
of those over the other.

Hence, what 0001 below does is to leave "char" in the string
category, but explicitly disable its access to the special
cast-via-I/O rules. This is a hack for sure, but it won't have
any surprising side-effects on other types, which changing the
general operator-matching rules could. The only thing it breaks
in check-world is that contrib/citext expects casting between
"char" and citext to work. I think that's not a very reasonable
expectation so I just took out the relevant test cases. (If anyone
is hot about it, we could add explicit support for such casts in
the citext module ... but it doesn't seem worth the trouble.)

As for point 2, I haven't found any negative side-effects of
taking the special types out of the string category, so 0002
attached invents a separate TYPCATEGORY_INTERNAL category to
put them in.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAOC8YUcXymCMpC5d%3D7JvcwyjXPTT00WeebOM3UqTBreOD1N9hw%40mail.gmail.com

Attachment Content-Type Size
0001-hack-behavior-of-char-type.patch text/x-diff 5.0 KB
0002-dont-put-special-purpose-types-in-string-category.patch text/x-diff 3.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-12-02 21:31:17 Re: O(n) tasks cause lengthy startups and checkpoints
Previous Message Bossart, Nathan 2021-12-02 21:19:16 Re: O(n) tasks cause lengthy startups and checkpoints