Re: Incautious handling of overlength identifiers

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Incautious handling of overlength identifiers
Date: 2016-12-26 17:05:55
Message-ID: 6481.1482771955@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> On Sat, Dec 24, 2016 at 7:44 AM, Joe Conway <mail(at)joeconway(dot)com> wrote:
>> On 12/23/2016 12:44 PM, Tom Lane wrote:
>>> An alternative worth considering, especially for the back branches,
>>> is simply to remove the Assert in hashname(). That would give us
>>> the behavior that non-developers see anyway, which is that these
>>> functions always fail to match overlength names, whether or not
>>> the names would have matched after truncation. Trying to apply
>>> truncation more consistently could be left as an improvement
>>> project for later.

>> That sounds reasonable to me.

> +1 for just removing the assertion on back-branches. On HEAD, it seems
> right to me to keep the assertion. However it is not possible to just
> switch those routines from text to name as a table could be defined
> with its schema name. So at minimum this would require adjusting
> textToQualifiedNameList() & similar routines in charge of putting in
> shape the name lists.

textToQualifiedNameList() already does truncate, and I suspect everyplace
that deals in qualified names does as well. The problem places are those
where someone just took a text parameter, did text_to_cstring on it, and
started treating the result as a Name (e.g. by passing it to a catalog
lookup function). Since we've intentionally allowed C strings to be used
as Names in most places, it's not even immediately obvious that this is
wrong.

I'm currently inclined to think that removing the assertion from hashname()
is right even for HEAD, because it's not very relevant to the operation
of that routine (you'll get back some hash value even for overlength
strings), and because it's a pretty useless way of enforcing truncation.
It basically will only catch you if you try to do a syscache lookup to
resolve a name --- if you do a catalog heap or index scan, those paths
contain no such gotcha. And I'm disinclined to introduce one.

The real problem with trying to enforce this through length assertions
in low-level routines is that they'll only reveal a bug if you actually
happen to test the appropriate calling code path with an overlength name.
We've obviously failed to do that in the past and I have little faith that
we'd do it in the future.

So that's why I was thinking about whether we could do this through some
datatype-based approach, whereby we could hope to catch incorrect coding
reliably through compiler checks. But given our history of allowing C
strings as names, I'm afraid that any such change would be enormously
invasive and not worth the trouble.

Another idea worth considering is to just make the low-level functions
do truncation, ie the fix in hashname would look more like

- Assert(keylen < NAMEDATALEN);
+ if (keylen >= NAMEDATALEN)
+ keylen = pg_mbcliplen(key, keylen, NAMEDATALEN - 1);

and we'd need something similar in the name comparison functions.
But that would be slightly annoying from a performance standpoint.
Not so much the extra pg_mbcliplen calls, because we could assume
those wouldn't happen in any performance-interesting cases; but
there are no strlen calls in the name comparison functions right now,
so we'd have to add some, and those would get hit every time through.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-12-26 17:20:18 Re: proposal: session server side variables
Previous Message Pavel Stehule 2016-12-26 16:49:44 Re: proposal: session server side variables