Re: [GENERAL] string_to_array with empty input

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] string_to_array with empty input
Date: 2009-04-01 14:38:50
Message-ID: 20090401143850.GA11167@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

[ oops, forgot to send this to -hackers before ]

On Tue, Mar 31, 2009 at 05:08:45PM +0100, Greg Stark wrote:
> Both interpretations are clearly consistent but it depends on whether
> you think it's a bunch of text strings concatenated together or if
> it's a list of objects.
>
> The example of string_to_array('',',')::int[] is relevant to this
> point. The whole "there's one empty element" only makes sense if
> you're thinking in terms of string processing. If it's a list of any
> other kind of object it probably doesn't make sense; you can't say
> there's one empty integer or one empty composite object or one empty
> anything else.

I think this is about the only sensible option, but my reasoning is
somewhat different.

My original intuition was that array_to_string and string_to_array
should be (perfect) inverses of each other. Unfortunately I can't see
any way to get this to happen; zero length arrays or NULL elements
in the array mean than array_to_string isn't injective. This means
that the composition of the two functions won't result in an injective
function and my original premise is false. Note that as far as I can
tell string_to_array is injective. I'm assuming that the delimiter
won't appear as part of an element of the array; e.g. an array of
integers and space as a delimiter is OK, but using the same delimiter
with unconstrained text is not OK, a blank delimiter is never OK as it's
always part of a string.

"Injective" means there exists more than one array that encodes to the
same string. The examples being how do you sanely encode '{}' and
'{NULL}' in a unique way; '{""}' is a bad example because it's just
an artifact of how strings are represented. The complications needed
to allow this to happen would make it a very similar function as the
array's normal output_function function and hence wouldn't serve a
useful purpose. All of this implies that we have to make a compromise
somewhere.

The semantics that most closely match the existing behaviour would be;
for array_to_string:

1) remove NULL values from input array
2) call output_function on remaining elements
3) intersperse[1] the delimiter between the remaining elements
4) concatenate the resulting array

for string_to_array:

1) check if input is zero length; return empty array
2) split array based on delimiter and return

Note that both functions are STRICT; i.e. a NULL for either parameter
should cause the function to return NULL. Arguably in string_to_array
it could be non-strict if the input string is empty, but it's probably
worth keeping it strict to simplify the semantics.

Here are some examples:

array_to_string('{}'::TEXT[],',') => ''
array_to_string('{NULL}'::TEXT[],',') => ''
array_to_string('{NULL,NULL}'::TEXT[],',') => ''
array_to_string('{a,NULL}'::TEXT[],',') => 'a'
array_to_string('{NULL,a}'::TEXT[],',') => 'a'
array_to_string('{a,b}'::TEXT[],',') => 'a,b'
array_to_string('{a,NULL,b}'::TEXT[],',') => 'a,b'

string_to_array('',',') => '{}'
string_to_array(' ',',') => '{" "}'
string_to_array(',',',') => '{"",""}'
string_to_array('a',',') => '{a}'
string_to_array('a,',',') => '{a,""}'
string_to_array(',a',',') => '{"",a}'
string_to_array('a,b',',') => '{a,b}'

My thinking before was that it should be doing:

string_to_array('',',') => '{""}'

instead, but I now think that Greg has a point and these are nicer/more
generally useful semantics.

Hum, that all got a bit more verbose than I was expecting. Ah well, I
hope it's somewhat useful.

--
Sam http://samason.me.uk/

[1] as in the intersperse function in Haskell
http://www.haskell.org/onlinereport/list.html#sect17.3
intersperse "#" ["a", "bar"] == ["a", "#", "bar"]
note that here we're working with arrays of string, rather than
arrays of characters as in the report.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Justin Funk 2009-04-01 15:14:45 Retain PREPARE or connect trigger
Previous Message Laurent Rahuel 2009-04-01 14:38:10 Statement trigger on delete object OID

Browse pgsql-hackers by date

  From Date Subject
Next Message Vincze, Tamas 2009-04-01 14:47:51 Patch to speed up pg_dump
Previous Message Sam Mason 2009-04-01 14:00:06 Re: string_to_array with empty input