Re: invalidly encoded strings

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: invalidly encoded strings
Date: 2007-09-10 12:16:35
Message-ID: 46E535A3.1040907@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> In the short run it might be best to do it in scan.l after all.
>>>
>
>
>> I have not come up with a way of doing that and handling the bytea case.
>>
>
> AFAICS we have no realistic choice other than to reject \0 in SQL
> literals; to do otherwise requires API changes throughout that stack of
> modules. And once you admit \0 is no good, it's not clear that
> \somethingelse is any better for bytea-using clients. Moreover, given
> that we are moving away from backslash escapes as fast as we can sanely
> travel, expending large amounts of energy to make them work better
> doesn't seem like a good use of development manpower.
>
>
>

Perhaps we're talking at cross purposes.

The problem with doing encoding validation in scan.l is that it lacks
context. Null bytes are only the tip of the bytea iceberg, since any
arbitrary sequence of bytes can be valid for a bytea. So we can only do
validation of encoding on a literal when we know it isn't destined for a
bytea. That's what I haven't come up with a way of doing in the scanner
(and as you noted upthread it's getting pretty darn late in the cycle
for us to be looking for ways to do things).

I still don't see why it's OK for us to do validation from the foo_recv
functions but not the corresponding foo_in functions. At least in the
short term that would provide us with fairly complete protection against
accepting invalidly encoded data into the database, once we fix up
chr(), without having to mess with the scanner, parser, COPY code etc.
We could still get corruption from UDFs and UDTs - it's hard to see how
we can avoid that danger. Yes, we would need to make sure that any
additions to the type system acted properly, and yes we should fix up
any validation inefficiencies (like possibly inlining calls in the UTF8
validation code). Personally, I don't see those as killer objections.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2007-09-10 12:28:27 Re: Include Lists for Text Search
Previous Message Teodor Sigaev 2007-09-10 12:12:09 Re: integrated tsearch doesn't work with non utf8 database

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2007-09-10 12:18:05 Re: HOT patch - version 15
Previous Message Oleg Bartunov 2007-09-10 12:10:26 Re: Include Lists for Text Search