Re: custom function for converting human readable sizes to bytes

From: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: custom function for converting human readable sizes to bytes
Date: 2016-01-26 05:25:34
Message-ID: CAKOSWNm+SSgGKYsv09n_6HhfWzmRr+cSjpgfhBPFf5nNPfJ5JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

I have reviewed this patch. It applies and compiles cleanly at the
current master 1129c2b0ad2732f301f696ae2cf98fb063a4c1f8 and implements
the behavior reached by a consensus.

All size units (the same as used in the GUC) are supported.

The documentation is present and describes behavior of the function.

The code is mostly clear and commented enough, but I doubt about
design (see p.3).

Regression tests are present (see p.II).

pg_proc.h has changed, so the CATALOG_VERSION_NO must be also changed.

Code comments, error message string, DESCR in pg_proc.h and doc
changes need proof-reading by a native English speaker, which I am
not.

===
The last patch fixes only part of my last notes, so I can only return
it to the next round with the same notes.

The list of my notes has not been done (all points listed below was
mentioned in my last review[1]):

1. The documentation has duplicate parts of size units (in table and
in the text). I think that one of them should be deleted (from the
table).

2. Test has a note about sharing(using) size units via(from)
memory_unit_conversion_table and a requirement to update the
documentation when new units are supported, but the note will not be
seen because there is no tests for that fail for units ("PB", "EB",
"ZB") which will change in the future by the
"memory_unit_conversion_table".

3. Code design is looked a little odd for me.

3a. There is an attempt to check a correctness of a numeric value
instead of using numeric_in which does the same thing. In the last
case the code looks more compact and nice (see the full explanation in
the previous review[1] and both PoC there).

Possibly that way is chosen to emit its own error messages ("invalid
size:" instead of "invalid input syntax for type numeric:"), but it
still leads errors like 'invalid unit: "+ kB"' for input value "+912+
kB" (see "dbsize.out").

3b. There is an extra copying from the buffer "str" to the "buffer"
(it can be avoided and shown in an PoC "single_buffer.c").

3c. There is freeing buffers "str" and "buffer" that looks as a
counter-design since MemoryContext frees them at once with other
expired buffers (like "num" from numeric_in and numeric_mul; also
"mul_num" from int8_numeric)

===
While I was writing that text I found five additional notes. Both
parts (above and below) are important (see concusion).

I. The new version of the patch (pg-size-bytes-10.patch) has the
simplest typo in an updated (from v9) row "inavalid size".

II. There is no test for an empty input value, but it works as
expected (an error 'invalid size: ""').

III. There is no support of 'bytes' unit, it seems such behavior got
majority approval[2].

IV. Code design of the function "parse_memory_unit" is returning
"bool" value and set "multiplier" via pointer. But while Tom Lane was
reviewed my patch[3] in the similar case he changed code that now
returns (see NonFiniteTimestampTzPart) not a flag, but a value (where
zero still means an error). Be honest it looks better, but the current
code is written like nearby functions.

V. The documentation lacks a note that the base of the "pg_size_bytes"
is 1024 whereas the base of the "pg_size_pretty" is 1000.

===
Concusion:

I *can't* mark this patch as "Ready for Committer" with all previous
notes, and I ask experienced hackers for a help.
I'm not a strong Postgres hacker yet and sending to the next round
with the same notes seems a nitpicking for me.

So questions to the hackers:

a. Are the rest of the notes from the previous review[1] (and listed
in pp.1-3c) a nitpicking or should be done?

b. Does the function in PoC "single_buffer.c" look more complex or easer?

c. Is it worth for p.II to change error message like 'Empty input
value is not a valid size for pg_size_bytes: "%s"'?

d. Should the function "pg_size_bytes" handle "bytes" as a size unit
(see p.III)?
There was no disagreement, but if I saw that thread before I became a
reviewer and saw silent approval I would have argued. Moreover
supporting of negative sizes was explained[4] by a symmetry for the
"pg_size_pretty", which should include all size units produced by the
"pg_size_pretty". On the other hand those functions can't be symmetric
due to difference in bases (see p.V and [5])

e. What way is preferred for p.IV?

f. If p.V is actual, how to write it in the documentation (be honest
I've no idea)? Should a description of the "pg_size_pretty" be changed
too?

[1]http://www.postgresql.org/message-id/CAKOSWNn=p8GX-P5Y-B4t4dg-aAHaTup+CrG41hQ9BeobZwX3KQ@mail.gmail.com
[2]http://www.postgresql.org/message-id/CACACo5QW7fFsFfhKsTjtYcP4QF3Oh9zA14SC6Z3DXx2yssJjSw@mail.gmail.com
[3]http://git.postgresql.org/pg/commitdiff/647d87c56ab6da70adb753c08d7cdf7ee905ea8a
[4]http://www.postgresql.org/message-id/CA+TgmoZFomG4eYorZZGf7pzotG9PxpUhtQvxLfsKiM4iZH8KRQ@mail.gmail.com
[5]http://www.postgresql.org/message-id/CA+TgmoYgS_xixUQy+xyw9futtMrTbC_S0c1C0_wOBwVbNf8aZA@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-01-26 06:00:43 Re: why pg_size_pretty is volatile?
Previous Message Vinayak Pokale 2016-01-26 02:22:44 Re: [PROPOSAL] VACUUM Progress Checker.