Re: regclass and format('%I')

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jason Dusek <jason(dot)dusek(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: regclass and format('%I')
Date: 2015-03-15 06:18:13
Message-ID: CAKFQuwY0w-_gHcNsAEWVU83ti9joT55YaMhNRRmphVGT-pGeXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Mar 14, 2015 at 8:29 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jason Dusek <jason(dot)dusek(at)gmail(dot)com> writes:
> > It honestly seems far more reasonable to me that %s and %I should do
> > the exact same thing with regclass.
>
> You're mistaken. The operation of format() is first to convert the
> non-format arguments to text strings, using the output functions for their
> data types, and then to further process those text strings according to
> the format specifiers:
>
> %s -- no additional processing, just insert the string as-is.
> %I -- apply double-quoting transformation to create a valid SQL identifier.
> %L -- apply single-quoting transformation to create a valid SQL literal.
>
> In the case of regclass, the output string is already double-quoted
> as necessary, so applying %I to it produces a doubly double-quoted
> string which is almost certainly not what you want. But it's not
> format()'s job to be smarter than the user. If it tried to avoid
> an extra pass of double quoting, it would get some cases wrong,
> potentially creating security holes.
>
>
>
TBH ​I'm not all that convinced by this argument​.

First, it is not being smarter than the user but allowing the user to
generalize their problem so that they do not need to take the nature of the
input data into account and can write a semantically meaningful pattern
string instead. The risk of them incorrectly choosing between %s or %I and
opening a security hole seems higher - if not as widespread - than any
string logic we could apply.

Second, presupposing the the transformation of the input must be a single
"thing", and that we are doing the %I conversion based upon our own
internal (or SQL's at the matter may be) definition of what it means to
"quote an identifier", we should be capable of noticing that the provided
input is already a single "thing" which has been escaped according to said
rules.

​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
it is possible ​for format to lay in a value at %I that is any more
insecure than the current behavior. If the input string already matches
that pattern then it could be output as-is without any additional risk and
with the positive benefit of making this case work as expected. The broken
case then exists when someone actually intends to name their identifier
<"something"> which then correctly becomes <"""something"""> on output.

Since there is a behavior change involved there needs to be a convincing
use-case for the new behavior in order to justify the effort to change it.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Seref Arikan 2015-03-15 08:25:30 Re: is there a relationship between indexes and temporary file creation?
Previous Message Pavel Stehule 2015-03-15 05:20:20 Re: regclass and format('%I')