Re: custom function for converting human readable sizes to bytes

From: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: custom function for converting human readable sizes to bytes
Date: 2016-01-22 06:03:07
Message-ID: CAKOSWNn=p8GX-P5Y-B4t4dg-aAHaTup+CrG41hQ9BeobZwX3KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/20/16, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> ...
> New version is attached
>
> Regards
> Pavel

Hello!

1. Why the function is marked as VOLATILE? It depends only on input
value, it does change nothing in the DB.
I guess it should be IMMUTABLE for efficient caching its result.

2.
> text *arg = PG_GETARG_TEXT_PP(0);
...
> str = text_to_cstring(arg);
>
Hmm... My question was _why_ you get TEXT and convert it instead of
getting an argument of type cstring (not about usage of
VARSIZE_ANY_EXHDR).
It was enough to give me a link[1] and leave the usage of
VARSIZE_ANY_EXHDR as is.

I think all the other claims below are mostly cosmetic. Main behavior
is OK (considering its usage will not be in heavy queries).

===
Documentation:
Besides currently added row in a table I think it is worth to add
detailed comment after a block "<function>pg_size_pretty</> can be
used" similar to (but the best way is to get advice for a correct
phrase):
"pg_size_bytes can be used to get a size in bytes as bigint from a
human-readable format for a comparison purposes (it is the opposite to
pg_size_pretty function). The format is numeric with an optional size
unit: kB, MB, GB or TB."
(and delete unit sizes from the table row).

===
Tests:
Since there was a letter with an explanation why units bigger than
"TB" can't be used[2], it is a good reason to add that size units
("PB", "EB", "ZB") in tests with a note to update the documentation
part when that unit are supported.

===
Code style:
1. When you declare pointers, you must align _names_ by the left
border, i.e. asterisks must be at one position to the left from the
aligned names, i.e. one to three spaces instead of TAB before them.

2.
> int multiplier;
One more TAB to align it with the name at the next line.

===
Code:

> /* allow whitespace between integer and unit */
May be "numeric" instead of "integer"?

> "\"%s\" is not in expected format"
It is not clear what format is expected.

> /* ignore plus symbol */
It seems it is not ignored, but copied... ;-)

> to ger hintstr
s/ger/get/
> Elsewhere is used space trimmed buffer.
I'd write it as "Otherwise trimmed value is used."
I'm not good at English, but that full block looks little odd for me.
I tried to reword it in the attachment single_buffer.c, but I don't
think it is enough.

> We allow ending spaces.
"trailing"?

> but this message can be little bit no intuitive - better text is "is not a valid number"
It was a block with a comment "don't allow empty string", i.e. absence
of a number...

> Automatic memory deallocation doesn't cover all possible situations where
> the function can be used - for example DirectFunctionCall - so explicit
> deallocation can descrease a memory requirements when you call these
> functions from C.
DirectFunctionCall uses a memory context of the caller. For example,
you don't free "num" allocated by numeric_in and numeric_mul, also
mul_num allocated by int8_numeric...
I'd ask experienced hackers for importance of freeing (or leaving)
"str" and "buffer".

> postgres=# select pg_size_bytes('+912+ kB');
> ERROR: invalid unit: "+ kB"
> This is difficult - you have to divide string to two parts and first world is number, second world is unit.
Your code looks like a duplicated (not by lines, but by behavior).
numeric_in has similar checks, let it do them for you. The goal of
your function is just split mantissa and size unit, and if the last
one is found, turn it into an exponent.
See my example in the attachment check_mantissa_by_numeric_in.c. There
is just searching non-space char that can't be part of numeric. Then
all before it passes to numeric_in and let it check anything it
should. I even left first spaces to show full numeric part to a user
if an error occurs (for any case).
The second attachment single_buffer.c is an attempt to avoid
allocating the second buffer (the first is "str" allocated by
text_to_cstring) and copying into it. I don't think it gives a big
improvement, but nevertheless.

===
[1] http://www.postgresql.org/message-id/29618.1451882238@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=DHB3O0Pc-nX1v3xJSzKN3z9KBeXgcQTRg@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Attachment Content-Type Size
check_mantissa_by_numeric_in.c text/x-csrc 2.7 KB
single_buffer.c text/x-csrc 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-01-22 06:17:51 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Previous Message Michael Paquier 2016-01-22 05:45:57 Re: silent data loss with ext4 / all current versions