Escape handling in COPY, strings, psql

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Escape handling in COPY, strings, psql
Date: 2005-05-29 03:58:01
Message-ID: 200505290358.j4T3w1n25524@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Here is an updated version of the COPY \x patch. It is the first patch
> > attached.
> > Also, I realized that if we support \x in COPY, we should also support
> > \x in strings to the backend. This is the second patch.
>
> Do we really want to do any of these things? We've been getting beaten
> up recently about the fact that we have non-SQL-spec string escapes
> (ie, all the backslash stuff) so I'm a bit dubious about adding more,
> especially when there's little if any demand for it.

I thought about that, but adding additional escape letters isn't our
problem --- it is the escape mechanism itself that is the issue.

I have wanted to post on this issue so now is a good time. I think we
have been validly beaten up in that we pride ourselves on standards
compliance but have escape requirement on all strings. Our string
escapes are a major problem --- not the number of them but the
requirement to double backslashes on input, like 'C:\\tmp'. I am
thinking the only clean solution is to add a special keyword like ESCAPE
before strings that contain escape information. I think a GUC is too
general. You know if the string is a constant if it contains escapes
just by looking at it, and if it is a variable, hopefully you know if it
has escapes.

Basically, I think we have to deal with this somehow. I think it could
be implemented by looking for the ESCAPE keyword in parser/scan.l and
handling it all in there by ignoring backslash escapes if ESCAPE
preceeds the string. By the time you are in gram.y, it is too late.

> I don't object too much to the COPY addition, since that's outside any
> spec anyway, but I do think we ought to think twice about adding this
> to SQL literal handling.
>
> > Third, I found out that psql has some unusual handling of escaped
> > numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
> > is octal, and \0xddd is decimal. It is basically following the strtol()
> > rules for an escaped value. This seems confusing and contradicts how
> > the rest of our system works.
>
> I agree, that's just going to confuse people.
>
> > ! xqescape [\\][^0-7x]
>
> If you are going to insist on this, at least make it case-insensitive.

The submitted COPY patch also was case-insensitive, \x and \X, but I
changed that because we are case-sensitive for all backslashes in COPY,
and C is the same (\n and \N are different too, so we actually use the
case-sensitivity). Should we allow \X just so it is case-insensitive
like the SQL specification X'4f'? That is the only logic I can think of
for it to be case-insensitive, but we have to then do that at all
levels, and I am not sure it makes sense.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-05-29 04:25:22 Re: unsafe use of hash_search(... HASH_ENTER ...)
Previous Message Gregory Maxwell 2005-05-29 01:29:38 Bloom Filter indexes?

Browse pgsql-patches by date

  From Date Subject
Next Message Michael Paesold 2005-05-29 07:37:54 Re: [HACKERS] patches for items from TODO list
Previous Message Tom Lane 2005-05-28 19:04:46 Re: [HACKERS] patches for items from TODO list