Re: proposal - function string_to_table

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal - function string_to_table
Date: 2020-08-20 02:06:42
Message-ID: CAHut+PsKQu429JKc3ynysfMXrD2G7wVtLW+EytO+GX-6Jw=UwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

I have been looking at the patch: string_to_table-20200706-2.patch

Below are some review comments for your consideration.

====

COMMENT func.sgml (style)

+ <para>
+ splits string into table using supplied delimiter and
+ optional null string.
+ </para>

The format style of the short description is inconsistent with the
other functions.
e.g. Should start with Capital letter.
e.g. Should tag the parameter names properly

Something like:
<para>
Splits <parameter>string</parameter> into a table
using supplied <parameter>delimiter</parameter>
and optional null string <parameter>nullstr</parameter>.
</para>

====

COMMENT func.sgml (what does nullstr do)

The description does not sufficiently describe the purpose/behaviour
of the nullstr.

e.g. Firstly I thought that it meant if 2 consecutive delimiters were
encountered it would substitute this string as the row value. But it
is doing the opposite of what I guessed - if the extracted row value
is the same as nullstr then a NULL row is inserted instead.

====

COMMENT func.sgml (wrong sample output)

+<programlisting>xx
+yy,
+zz</programlisting>

This output is incorrect for the sample given. There is no "yy," in
the output because there is a 'yy' nullstr substitution.

Should be:
---
xx
NULL
zz
---

====

COMMENT func.sgml (related to regexp_split_to_table)

Because this new function is similar to the existing
regexp_split_to_table, perhaps they should cross-reference each other
so a reader of this documentation is made aware of the alternative
function?

====

COMMENT (test cases)

It is impossible to tell difference in the output between empty
strings and nulls currently, so maybe you can change all the tests to
have a form like below so they can be validated properly:

# select v, v IS NULL as "is null" from
string_to_table('a,b,*,c,d,',',','*') g(v);
v | is null
---+---------
a | f
b | f
| t
c | f
d | f
| f
(6 rows)

or maybe like this is even easier:

# select quote_nullable(string_to_table('a|*||c|d|','|','*'));
quote_nullable
----------------
'a'
NULL
''
'c'
'd'
''
(6 rows)

Something similar was already proposed before [1] but that never got
put into the test code.
[1] https://www.postgresql.org/message-id/CAFj8pRDSzDYmaS06dfMXBfbr8x%2B3xjDJxA5kbL3h8%2BeOGoRUcA%40mail.gmail.com

====

COMMENT (test cases)

There are multiple combinations of the parameters to this function and
MANY different results depending on different values they can take, so
the existing tests only cover a small sample.

I have attached a lot more test scenarios that you may want to include
for better test coverage. Everything seemed to work as expected.

PSA test-cases.pdf

====

COMMENT (accum_result)

+ Datum values[1];
+ bool nulls[1];
+
+ values[0] = PointerGetDatum(result_text);
+ nulls[0] = is_null;

Why not use variables instead of arrays with only 1 element?

====

COMMENT (text_to_array_internal)

+ if (!tstate.astate)
+ PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));

Maybe the condition is more readable when expressed as:
if (tstate.astate == NULL)

====

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
test-cases.pdf application/pdf 456.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-08-20 02:09:42 Re: Creating a function for exposing memory usage of backend process
Previous Message Amit Langote 2020-08-20 01:50:29 Re: Creating foreign key on partitioned table is too slow