Re: proposal - function string_to_table

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "movead(dot)li(at)highgo(dot)ca" <movead(dot)li(at)highgo(dot)ca>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal - function string_to_table
Date: 2020-08-20 19:21:10
Message-ID: CAFj8pRD+W3YcNm72+VbMXH1dzi0m4e1BHppfpwjiDP6nVVM8og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

čt 20. 8. 2020 v 4:07 odesílatel Peter Smith <smithpb2250(at)gmail(dot)com> napsal:

> Hi.
>
> I have been looking at the patch: string_to_table-20200706-2.patch
>
> Below are some review comments for your consideration.
>
> ====
>
> COMMENT func.sgml (style)
>
> + <para>
> + splits string into table using supplied delimiter and
> + optional null string.
> + </para>
>
> The format style of the short description is inconsistent with the
> other functions.
> e.g. Should start with Capital letter.
> e.g. Should tag the parameter names properly
>
> Something like:
> <para>
> Splits <parameter>string</parameter> into a table
> using supplied <parameter>delimiter</parameter>
> and optional null string <parameter>nullstr</parameter>.
> </para>
>
>
done

> ====
>
> COMMENT func.sgml (what does nullstr do)
>
> The description does not sufficiently describe the purpose/behaviour
> of the nullstr.
>
> e.g. Firstly I thought that it meant if 2 consecutive delimiters were
> encountered it would substitute this string as the row value. But it
> is doing the opposite of what I guessed - if the extracted row value
> is the same as nullstr then a NULL row is inserted instead.
>
>
done

> ====
>
> COMMENT func.sgml (wrong sample output)
>
> +<programlisting>xx
> +yy,
> +zz</programlisting>
>
> This output is incorrect for the sample given. There is no "yy," in
> the output because there is a 'yy' nullstr substitution.
>
> Should be:
> ---
> xx
> NULL
> zz
> ---
>

fixed

> ====
>
> COMMENT func.sgml (related to regexp_split_to_table)
>
> Because this new function is similar to the existing
> regexp_split_to_table, perhaps they should cross-reference each other
> so a reader of this documentation is made aware of the alternative
> function?
>

I wrote new sentence with ref

>
> ====
>
> COMMENT (test cases)
>
> It is impossible to tell difference in the output between empty
> strings and nulls currently, so maybe you can change all the tests to
> have a form like below so they can be validated properly:
>
> # select v, v IS NULL as "is null" from
> string_to_table('a,b,*,c,d,',',','*') g(v);
> v | is null
> ---+---------
> a | f
> b | f
> | t
> c | f
> d | f
> | f
> (6 rows)
>
> or maybe like this is even easier:
>
> # select quote_nullable(string_to_table('a|*||c|d|','|','*'));
> quote_nullable
> ----------------
> 'a'
> NULL
> ''
> 'c'
> 'd'
> ''
> (6 rows)
>

I prefer the first variant, it is clean. It is good idea, done

> Something similar was already proposed before [1] but that never got
> put into the test code.
> [1]
> https://www.postgresql.org/message-id/CAFj8pRDSzDYmaS06dfMXBfbr8x%2B3xjDJxA5kbL3h8%2BeOGoRUcA%40mail.gmail.com
>
> ====
>
> COMMENT (test cases)
>
> There are multiple combinations of the parameters to this function and
> MANY different results depending on different values they can take, so
> the existing tests only cover a small sample.
>
> I have attached a lot more test scenarios that you may want to include
> for better test coverage. Everything seemed to work as expected.
>

ok, merged

> PSA test-cases.pdf
>
> ====
>
> COMMENT (accum_result)
>
> + Datum values[1];
> + bool nulls[1];
> +
> + values[0] = PointerGetDatum(result_text);
> + nulls[0] = is_null;
>
> Why not use variables instead of arrays with only 1 element?
>

Technically it is equivalent, but I think so using one element array is
more correct, because function heap_form_tuple expects an array. Sure in C
language there is no difference between pointer to value or pointer to
array, but minimally the name of the argument "values" implies so argument
is an array.

This pattern is used more times in Postgres. You can find a fragments where
although we know so array has only one field, still we works with array

misc.c
hash.c
execTuples.c

but I can this code simplify little bit - I can use function
tuplestore_putvalues(tupstore, tupdesc, values, nulls);

I see, so this code can be reduced more, and I don't need local variables,
but I prefer to be consistent with other parts, and I feel better if I pass
an array where the array is expected.

This is not extra important, and I can it change, just I think this variant
is cleaner little bit

> ====
>
> COMMENT (text_to_array_internal)
>
> + if (!tstate.astate)
> + PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));
>
> Maybe the condition is more readable when expressed as:
> if (tstate.astate == NULL)
>
>
done

new patch attached

Thank you for precious review

Regards

Pavel

====
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>

Attachment Content-Type Size
string_to_table-20200820.patch text/x-patch 28.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Roger Pack 2020-08-20 20:38:38 Re: deb repo doesn't have latest. or possible to update web page?
Previous Message Ashwin Agrawal 2020-08-20 18:34:08 Re: language cleanups in code and docs